Skip to main content


         This documentation site is for previous versions. Visit our new documentation site for current releases.      
 

This content has been archived and is no longer being updated.

Links may not function; however, this content may be relevant to outdated versions of the product.

Impact of failing nodes on system stability

Updated on March 11, 2021

This content applies only to On-premises and Client-managed cloud environments

Learn how the number of functional nodes and the current replication factor affect system stability when some of the Cassandra nodes are down.

The replication factor indicates the number of existing copies of each record. The default replication factor is 3, which means that if three or more nodes fail, some data becomes unavailable. At the time of a write operation on a record, Cassandra determines which node will own the record. If all three nodes are unavailable, the write operation fails and writes the Unable to achieve consistency level ONE error to the Cassandra logs.

When three or more nodes are unavailable, some write operations succeed and some fail after a period of several seconds. This causes an increased write time and is the root cause of multiple failures. If an application that performs write operations to a Decision Data Store (DDS) data set does not handle write failures, the system might seem to be functioning correctly, only with a prolonged response time.

Therefore, activities that perform write operations to DDS through the DataSet-Execute method must include the StepStatusFail check-in transition step. The number of failed nodes should then never exceed the replication_factor value, minus 1. Otherwise, the system might behave incorrectly, for example, some write or read operations might fail. If the failed nodes do not become functional, then data might be permanently lost.

You can prevent data loss by determining the maximum affordable number of nodes that can be down at the same time (N), and configuring the replication factor to N+1.

Note: Increasing the replication factor impacts the response times for read and write operations.

  • Previous topic Configuring the replication factor
  • Next topic Defining Pega Platform access to an external Cassandra database

Have a question? Get answers now.

Visit the Support Center to ask questions, engage in discussions, share ideas, and help others.

Did you find this content helpful?

Want to help us improve this content?

We'd prefer it if you saw us at our best.

Pega.com is not optimized for Internet Explorer. For the optimal experience, please use:

Close Deprecation Notice
Contact us