Nodetool commands for monitoring Cassandra clusters
This content applies only to On-premises and Client-managed cloud environments
Verify the system health by using the nodetool
utility.
This utility comes as part of the Pega Platform deployment by
default.
nodetool
utility, see the Apache Cassandra documentation.nodetool status
- This command retrieves an overview of the cluster health, for
example:
Datacenter: datacenter1 =================== Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns (effective) Host ID Rack UN 10.123.2.59 1.1 TB 256 34.3% f4a8e5c3-b5be-40e8-bdbd-326c6ff54558 1c UN 10.123.2.74 937.92 GB 256 29.4% c097b89d-4aae-4803-be2f-8073062517bf 1d UN 10.123.2.13 1.18 TB 256 34.8% 047c7136-f385-458d-bf22-7e17ecad1ce2 1a UN 10.123.2.28 1.03 TB 256 32.7% a24abd86-1afa-4225-b93d-787e164ddcb2 1a UN 10.123.2.44 1016.13 GB 256 32.5% 4aa4dc44-2f23-4a60-8e51-ce959fd4c47d 1c UN 10.123.2.83 1.03 TB 256 33.4% 5aeab110-3f9a-4a17-a553-7f90ca31cd0e 1d UN 10.123.2.18 1.26 TB 256 32.6% 9fbf041a-952c-4709-820c-b2444c8410f3 1a UN 10.123.2.81 1.27 TB 256 37.2% cc0d9584-f461-4870-a7d7-225d5fc5c79d 1d UN 10.123.2.39 1.09 TB 256 33.2% 2a6dc514-3178-44af-997e-cae9d337d172 1c
- Healthy nodes return the following parameters:
- The node status is UN (up and normal).
- The Owns (effective) value should be roughly the same for each node.
- The percentage of data that each node manages should be similar, which indicates a good data spread across the cluster members and across multiple data centers. For example, in a six node cluster, the ownership should be approximately 50 percent per node.
nodetool tpstats
- This command retrieves a list of active and pending tasks, for
example:
Pool Name Active Pending Completed Blocked All time blocked MutationStage 0 0 517093808 0 0 ReadStage 0 0 60651127 0 0 RequestResponseStage 0 0 371026355 0 0 ReadRepairStage 0 0 5530147 0 0 CounterMutationStage 0 0 0 0 0 MiscStage 0 0 0 0 0 AntiEntropySessions 0 0 77061 0 0 HintedHandoff 0 0 12 0 0 GossipStage 0 0 4927463 0 0 CacheCleanupExecutor 0 0 0 0 0 InternalResponseStage 0 0 1092 0 0 CommitLogArchiver 0 0 0 0 0 CompactionExecutor 0 0 2217092 0 0 ValidationExecutor 0 0 1199227 0 0 MigrationStage 0 0 0 0 0 AntiEntropyStage 0 0 8193502 0 0 PendingRangeCalculator 0 0 13 0 0 Sampler 0 0 0 0 0 MemtableFlushWriter 0 0 148703 0 0 MemtablePostFlush 0 0 1378763 0 0 MemtableReclaimMemory 0 0 148703 0 0 Native-Transport-Requests 0 0 498700597 0 2131
- The following values are important for evaluating various aspects of the
cluster health:
- Mutation Stage for Cassandra write operations.
- Read Stage for Cassandra read operations.
- Compaction Executor for compaction operations.
- Native-Transport-Requests for CQL requests from clients.
nodetool tpstats
command as a cron job to run periodically and collect load data from each node. nodetool compactionstats
- This command verifies if Cassandra is processing compactions fast enough,
for
example:
root@ip-10-123-2-18:/usr/local/tomcat/cassandra/bin# ./nodetool compactionstats pending tasks: 2 compaction type keyspace table completed total unit progress Compaction data customer_b01be157931bcbfa32b7f240a638129d 744838490 883624752 bytes 84.29% Active compaction remaining time : 0h00m00s
- If the number of pending tasks consistently shows that Cassandra has the maximum allowed number of concurrent compactions in progress, it indicates that the number of SSTables is growing. An increased number of SSTables results in poor read latencies.
nodetool info
- This command retrieves the key cache, heap, and off-heap usage statistics,
for
example:
root@ip-10-123-2-18:/usr/local/tomcat/cassandra/bin# ./nodetool info ID : 9fbf041a-952c-4709-820c-b2444c8410f3 Gossip active : true Thrift active : true Native Transport active: true Load : 1.26 TB Generation No : 1543592679 Uptime (seconds) : 1655643 Heap Memory (MB) : 4864.30 / 12128.00 Off Heap Memory (MB) : 1840.39 Data Center : us-east Rack : 1a Exceptions : 56 Key Cache : entries 3647307, size 299.36 MB, capacity 300 MB, 81270677 hits, 341533804 requests, 0.238 recent hit rate, 14400 save period in seconds Row Cache : entries 0, size 0 bytes, capacity 0 bytes, 0 hits, 0 requests, NaN recent hit rate, 0 save period in seconds Counter Cache : entries 0, size 0 bytes, capacity 50 MB, 0 hits, 0 requests, NaN recent hit rate, 7200 save period in seconds Token : (invoke with -T/--tokens to see all 256 tokens)
- If the key cache size and capacity are roughly the same, consider increasing the key cache size.
nodetool cfstats or nodetool tablestats
- This command is valid starting from Cassandra version 3. This command
identifies the tables in which the number of SSTables is growing and shows
disk latencies and number of tombstones read per query, for
example:
Table: customer_b01be157931bcbfa32b7f240a638129d SSTable count: 10 Space used (live): 30627181576 Space used (total): 30627181576 Space used by snapshots (total): 0 Off heap memory used (total): 92412446 SSTable Compression Ratio: 0.1259434714106204 Number of keys (estimate): 31569551 Memtable cell count: 0 Memtable data size: 0 Memtable off heap memory used: 0 Memtable switch count: 0 Local read count: 9436525 Local read latency: 2.237 ms Local write count: 30788503 Local write latency: 0.015 ms Pending flushes: 0 Bloom filter false positives: 2220 Bloom filter false ratio: 0.00000 Bloom filter space used: 57390568 Bloom filter off heap memory used: 57390488 Index summary off heap memory used: 6246878 Compression metadata off heap memory used: 28775080 Compacted partition minimum bytes: 5723 Compacted partition maximum bytes: 6866 Compacted partition mean bytes: 6866 Average live cells per slice (last five minutes): 0.9993731802755781 Maximum live cells per slice (last five minutes): 1.0 Average tombstones per slice (last five minutes): 0.0 Maximum tombstones per slice (last five minutes): 0.0
- A high number of SSTables (for example, over 100) reduces read performance. Healthy systems typically have a maximum of around 25 SSTables per table. In a system where records are deleted often, the number of tombstones read per query can result in higher read latencies.
- Cassandra creates a new SSTable when the data of a column family in Memtable
is flushed to disk. Cassandra stores SSTable files of a column family in the
corresponding column family directory. The data in an SSTable is organized
in six types of component files. The format of an SSTable component file is
keyspace-column family-[tmp marker]-version-generation-component.db
nodetool cfhistograms keyspacetablename
ornodetool tablehistograms keyspacetablename
- This command is valid starting from Cassandra version 3. This command
provides further information about tables with high latencies, for
example:
data/customer_b01be157931bcbfa32b7f240a638129d histograms Percentile SSTables Write Latency Read Latency Partition Size Cell Count (micros) (micros) (bytes) 50% 2.00 0.00 1916.00 6866 2 75% 3.00 0.00 2759.00 6866 2 95% 3.00 0.00 4768.00 6866 2 98% 4.00 0.00 6866.00 6866 2 99% 4.00 0.00 8239.00 6866 2 Min 0.00 0.00 15.00 5723 2 Max 6.00 0.00 25109160.00 6866 2
Previous topic Performing regular monitoring activities on a Cassandra cluster Next topic Capturing Cassandra metrics