Achieve high performance in terms of data replication and consistency by
estimating the optimal database size to run a Cassandra cluster.
Before you begin: Obtain the sizing calculation tool by sending an email to
- On a production system on which you want to run a Cassandra cluster, select at
least three nodes.
Note: You can run multiple nodes on the same server provided that each node has
a different IP address.
- In the sizing calculation tool, in the fields highlighted in red, provide the
required information about records size for each of the following decision management services:
- In the DDS_Data_Sizing tab, provide information
about Decision Data Store (DDS), such as the number of records and the
average record key size.
- In the Delayed_Learning_Sizing tab, provide
information about adaptive models delayed learning, such as the number
of decision per minute and the average record key size.
For more information, see the Delayed learning of adaptive
models article on Pega Community.
- In the VBD_Sizing tab, provide information about
business monitoring and reporting, such as the number of dimensions and
- In the Model_Response_Sizing tab, provide
information about collecting the responses to your adaptive models, such
as the number of incoming responses in 24 hours.
- Calculate the required database size for your Cassandra cluster by summing up
the values of the Total required disk space fields from
- Ensure that you have enough disk space to run the DDS data sets by dividing the
database size that you calculated in step 3 by the number of available nodes and ensuring that the size
of each node does not exceed 50% of the database size.
- If you use the cluster for simulations and data flow runs, increase processing
speed by adding nodes to the cluster.