Sizing a Cassandra cluster

Achieve high performance in terms of data replication and consistency by estimating the optimal database size to run a Cassandra cluster.
Before you begin: Obtain the sizing calculation tool by sending an email to [email protected].
  1. On a production system on which you want to run a Cassandra cluster, select at least three nodes.
    Note: You can run multiple nodes on the same server provided that each node has a different IP address.
  2. In the sizing calculation tool, in the fields highlighted in red, provide the required information about records size for each of the following decision management services:
    1. In the DDS_Data_Sizing tab, provide information about Decision Data Store (DDS), such as the number of records and the average record key size.
    2. In the Delayed_Learning_Sizing tab, provide information about adaptive models delayed learning, such as the number of decision per minute and the average record key size.
      For more information, see the Delayed learning of adaptive models article on Pega Community.
    3. In the VBD_Sizing tab, provide information about business monitoring and reporting, such as the number of dimensions and measurements.
      For more information, see Visual Business Director planner.
    4. In the Model_Response_Sizing tab, provide information about collecting the responses to your adaptive models, such as the number of incoming responses in 24 hours.
      For more information, see Adaptive analytics.
  3. Calculate the required database size for your Cassandra cluster by summing up the values of the Total required disk space fields from each tab.
  4. Ensure that you have enough disk space to run the DDS data sets by dividing the database size that you calculated in step 3 by the number of available nodes and ensuring that the size of each node does not exceed 50% of the database size.
  5. If you use the cluster for simulations and data flow runs, increase processing speed by adding nodes to the cluster.