Operating system metrics on Cassandra nodes

Detect problems with Cassandra nodes by analyzing the operating system (OS) metrics.

vmstat
Identifies IO bottlenecks.
In the following example, the wait-io (wa) value is higher than ideal and is likely contributing to poor read/write latencies. The output of this command over a period of time with high latencies can show you if you are IO bound and if that may be a possible cause of latencies.

root@ip-10-123-5-62:/usr/local/tomcat# vmstat 5 5
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
r b swpd free buff cache si so bi bo in cs us sy id wa st
2 4 0 264572 32008 15463144 0 0 740 792 0 0 6 1 91 2 0 2 3 0 309336 32116 15421616 0 0 55351 109323 59250 89396 13 2 72 13 0 2 2 0 241636 32212 15487008 0 0 57742 50110 61974 89405 13 2 78 7 0
2 0 0 230800 32632 15498648 0 0 63669 11770 64727 98502 15 3 80 2 0 3 2 0 270736 32736 15456960 0 0 64370 94056 62870 94746 13 3 75 9 0
Netstat -anp | grep 9042
Shows if network buffers are building up.
The second and third columns in the output show the tcp Recv and Send buffer sizes. Consistently large numbers for these values indicate the inability of either the local Cassandra node or the client to handle processing of the network traffic. See the following sample output:

root@ip-10-123-5-62:/usr/local/tomcat# netstat -anp | grep 9042
tcp 0 0 10.123.5.62:9042 0.0.0.0:* LISTEN 475/java
tcp 0 0 10.123.5.62:9042 10.123.5.58:36826 ESTABLISHED 475/java
tcp 0 0 10.123.5.62:9042 10.123.5.19:54058 ESTABLISHED 475/java
tcp 0 138 10.123.5.62:9042 10.123.5.36:38972 ESTABLISHED 475/java
tcp 0 0 10.123.5.62:9042 10.123.5.75:50436 ESTABLISHED 475/java
tcp 0 0 10.123.5.62:9042 10.123.5.23:46142 ESTABLISHED 475/java
Log files
Shows the reasons why Cassandra has stopped working on the node. Usually provided in the /var/log/* directory.
In some cases the process might have been killed by the OS to prevent system from bigger failure caused by lack of resources. Common case is lack of memory which is indicated by the appearance of OOMKiller message in logs.