Nodetool commands for monitoring Cassandra clusters

Verify the system health by using the nodetool utility. This utility comes as part of the Pega Platform deployment by default. The following list contains the most useful commands that you can use to assess the cluster health along with sample outputs. For more information about the nodetool utility, see the Apache Cassandra documentation.
nodetool status
This command retrieves an overview of the cluster health, for example:
Datacenter: datacenter1 
=================== Status=Up/Down 
|/ State=Normal/Leaving/Joining/Moving 
-- Address Load Tokens Owns (effective) Host ID Rack 
UN 10.123.2.59 1.1 TB 256 34.3% f4a8e5c3-b5be-40e8-bdbd-326c6ff54558 1c 
UN 10.123.2.74 937.92 GB 256 29.4% c097b89d-4aae-4803-be2f-8073062517bf 1d 
UN 10.123.2.13 1.18 TB 256 34.8% 047c7136-f385-458d-bf22-7e17ecad1ce2 1a 
UN 10.123.2.28 1.03 TB 256 32.7% a24abd86-1afa-4225-b93d-787e164ddcb2 1a 
UN 10.123.2.44 1016.13 GB 256 32.5% 4aa4dc44-2f23-4a60-8e51-ce959fd4c47d 1c 
UN 10.123.2.83 1.03 TB 256 33.4% 5aeab110-3f9a-4a17-a553-7f90ca31cd0e 1d 
UN 10.123.2.18 1.26 TB 256 32.6% 9fbf041a-952c-4709-820c-b2444c8410f3 1a 
UN 10.123.2.81 1.27 TB 256 37.2% cc0d9584-f461-4870-a7d7-225d5fc5c79d 1d 
UN 10.123.2.39 1.09 TB 256 33.2% 2a6dc514-3178-44af-997e-cae9d337d172 1c
Healthy nodes return the following parameters:
  • The node status is UN (up and normal).
  • The Owns (effective) value should be roughly the same for each node.
  • The percentage of data that each node manages should be similar, which indicates a good data spread across the cluster members and across multiple data centers. For example, in a six node cluster, the ownership should be approximately 50 percent per node.
nodetool tpstats
This command retrieves a list of active and pending tasks, for example:
Pool Name Active Pending Completed Blocked All time blocked
MutationStage 0 0 517093808 0 0
ReadStage 0 0 60651127 0 0
RequestResponseStage 0 0 371026355 0 0
ReadRepairStage 0 0 5530147 0 0
CounterMutationStage 0 0 0 0 0
MiscStage 0 0 0 0 0
AntiEntropySessions 0 0 77061 0 0
HintedHandoff 0 0 12 0 0
GossipStage 0 0 4927463 0 0
CacheCleanupExecutor 0 0 0 0 0
InternalResponseStage 0 0 1092 0 0
CommitLogArchiver 0 0 0 0 0 CompactionExecutor 0 0 2217092 0 0
ValidationExecutor 0 0 1199227 0 0
MigrationStage 0 0 0 0 0
AntiEntropyStage 0 0 8193502 0 0
PendingRangeCalculator 0 0 13 0 0
Sampler 0 0 0 0 0
MemtableFlushWriter 0 0 148703 0 0
MemtablePostFlush 0 0 1378763 0 0
MemtableReclaimMemory 0 0 148703 0 0 Native-Transport-Requests 0 0 498700597 0 2131
The following values are important for evaluating various aspects of the cluster health:
  • Mutation Stage for Cassandra write operations.
  • Read Stage for Cassandra read operations.
  • Compaction Executor for compaction operations.
  • Native-Transport-Requests for CQL requests from clients.
An increased number of pending tasks indicates that Cassandra is not processing the requests fast enough. You can configure the nodetool tpstats command as a cron job to run periodically and collect load data from each node.
nodetool compactionstats
This command verifies if Cassandra is processing compactions fast enough, for example:
root@ip-10-123-2-18:/usr/local/tomcat/cassandra/bin# ./nodetool compactionstats
pending tasks: 2
compaction type keyspace table completed total unit progress
Compaction data customer_b01be157931bcbfa32b7f240a638129d 744838490 883624752 bytes 84.29%
Active compaction remaining time : 0h00m00s
If the number of pending tasks consistently shows that Cassandra has the maximum allowed number of concurrent compactions in progress, it indicates that the number of SSTables is growing. An increased number of SSTables results in poor read latencies.
nodetool info
This command retrieves the key cache, heap, and off-heap usage statistics, for example:
root@ip-10-123-2-18:/usr/local/tomcat/cassandra/bin# ./nodetool info
ID : 9fbf041a-952c-4709-820c-b2444c8410f3
Gossip active : true
Thrift active : true
Native Transport active: true
Load : 1.26 TB
Generation No : 1543592679
Uptime (seconds) : 1655643 Heap Memory (MB) : 4864.30 / 12128.00 Off Heap Memory (MB) : 1840.39
Data Center : us-east
Rack : 1a
Exceptions : 56 Key Cache : entries 3647307, size 299.36 MB, capacity 300 MB, 81270677 hits, 341533804 requests, 0.238 recent hit rate, 14400 save period in seconds
Row Cache : entries 0, size 0 bytes, capacity 0 bytes, 0 hits, 0 requests, NaN recent hit rate, 0 save period in seconds
Counter Cache : entries 0, size 0 bytes, capacity 50 MB, 0 hits, 0 requests, NaN recent hit rate, 7200 save period in seconds
Token : (invoke with -T/--tokens to see all 256 tokens)
If the key cache size and capacity are roughly the same, consider increasing the key cache size.
nodetool cfstats or nodetool tablestats
This command is valid starting from Cassandra version 3. This command identifies the tables in which the number of SSTables is growing and shows disk latencies and number of tombstones read per query, for example:
Table: customer_b01be157931bcbfa32b7f240a638129d SSTable count: 10
Space used (live): 30627181576
Space used (total): 30627181576
Space used by snapshots (total): 0
Off heap memory used (total): 92412446
SSTable Compression Ratio: 0.1259434714106204
Number of keys (estimate): 31569551
Memtable cell count: 0
Memtable data size: 0
Memtable off heap memory used: 0
Memtable switch count: 0
Local read count: 9436525 Local read latency: 2.237 ms
Local write count: 30788503 Local write latency: 0.015 ms
Pending flushes: 0
Bloom filter false positives: 2220
Bloom filter false ratio: 0.00000
Bloom filter space used: 57390568
Bloom filter off heap memory used: 57390488
Index summary off heap memory used: 6246878
Compression metadata off heap memory used: 28775080
Compacted partition minimum bytes: 5723
Compacted partition maximum bytes: 6866
Compacted partition mean bytes: 6866
Average live cells per slice (last five minutes): 0.9993731802755781
Maximum live cells per slice (last five minutes): 1.0
Average tombstones per slice (last five minutes): 0.0 Maximum tombstones per slice (last five minutes): 0.0
A high number of SSTables (for example, over 100) reduces read performance. Healthy systems typically have a maximum of around 25 SSTables per table. In a system where records are deleted often, the number of tombstones read per query can result in higher read latencies.
Cassandra creates a new SSTable when the data of a column family in Memtable is flushed to disk. Cassandra stores SSTable files of a column family in the corresponding column family directory. The data in an SSTable is organized in six types of component files. The format of an SSTable component file is keyspace-column family-[tmp marker]-version-generation-component.db
nodetool cfhistograms keyspacetablename or nodetool tablehistograms keyspace tablename
This command is valid starting from Cassandra version 3. This command provides further information about tables with high latencies, for example:

data/customer_b01be157931bcbfa32b7f240a638129d histograms
Percentile SSTables Write Latency Read Latency Partition Size Cell Count
(micros) (micros) (bytes)
50% 2.00 0.00 1916.00 6866 2
75% 3.00 0.00 2759.00 6866 2
95% 3.00 0.00 4768.00 6866 2
98% 4.00 0.00 6866.00 6866 2
99% 4.00 0.00 8239.00 6866 2
Min 0.00 0.00 15.00 5723 2
Max 6.00 0.00 25109160.00 6866 2