Repairing and cleaning Cassandra nodes for data consistency
This content applies only to On-premises and Client-managed cloud environments
To guarantee data consistency and cluster-wide data health, run a Cassandra repair and cleanup regularly, even when all nodes in the services infrastructure are continuously available. Regular Cassandra repair operations are especially important when data is explicitly deleted or written with a TTL value.
Schedule and perform repairs and cleanups in low-usage hours because they might affect system performance.When using the NetworkTopologyStrategy, Cassandra is informed about the cluster topology and each cluster node is assigned to a rack (or Availability Zone in AWS Cloud systems). Cassandra ensures that data written to the cluster is evenly distributed across the racks. When the replication factor is equal to the number of racks, Cassandra ensures that each rack contains a full copy of all the data. With the default replication factor of 3 and a cluster of 3 racks, this allocation can be used to optimize repairs.
- At least once a week, schedule incremental repairs by using the following nodetool command: \'nodetool repair -inc - par\'
- Optional: Check the progress of the repair operation by entering: nodetool
compactionstatsFor more information about troubleshooting repairs, see the "Troubleshooting hanging repairs" article in the DataStax documentation.
- If a node joins the cluster after more than one hour unavailability, run the
repair and cleanup activities:
- In the nodetool utility, enter: \'nodetool repair\'
- In the nodetool utility, enter: \'nodetool cleanup\'
Previous topic Key Cassandra metrics Next topic Pega alerts for Cassandra