Troubleshooting: CA Wily Introscope memory leak when SMA connected to remote PRPC servers
Summary
JVMs running PRPC are very low on heap memory. The heap dump contains 46,000+ plus references to the Java class java.net.ManagedSocketInputStreamHighPerformance
, consuming 85 percent of the heap of the PRPC systems. PRPC environments running on WebSphere Application Server experiencing this problem have System Management Application (SMA) connected to remote PRPC systems and CA Wily Introscope deployed in every JVM running PRPC. This is a known issue documented in the CA knowledge base document TEC534264, APM 9 Agent memory leak caused by Socket Tracer (Legacy KB ID WLY 2906).
Explanation
When using SMA remote connection with WebSphere Application Server, the WebSphere administrative client, without additional fix packs and configuration, uses HTTP 1.0 by default. With each node connection, information is properly collected about PRPC MBeans. However, the WebSphere administrative client's use of HTTP 1.0 results in 205 socket connections to the WebSphere SOAP port. When Wily socket monitoring is enabled, there is potential for these JMX calls to maintain a lower-level reference to java.net.ManagedSocketInputStreamHighPerformance
within the class java.lang.ThreadLocal.ThreadLocalMap
. As a result, the JVM garbage collector cannot remove these references; this causes the memory problem for the PRPC JVM.
In a reported case, the JVM had been running for about a week when it started to have major memory problems. Heap dumps showed 46,000 or more references to java.net.ManagedSocketInputStreamHighPerformance
taking up 85 percent of the memory heap.
Investigative Testing
Testing was done of a production PRPC JVM that was restarted earlier in the day and later taken out of rotation. SMA monitored its health, and the IBM HeapAnalyzer yielded the following results:
Notice the class com.wily.introscope.agent.probe.lang.ManagedThread
.
A couple of hours of SMA use resulted in 6,818 objects being maintained. Additional testing with only SMA led to an increase in the number of objects in the array of java/lang/Object
, and this number kept increasing. Heap dumps on the following day showed the same information; the problem persisted.
For the next test cycle, CA Wily Introscope was removed. After this change, test results showed no instances of java.net.ManagedSocketInputStreamHighPerformance
were being maintained from the JMX tree.
A CA Application Performance Management knowledge base document confirms this problem as a known issue:
TEC534264
APM 9 Agent memory leak caused by Socket Tracer ( Legacy KB ID WLY 2906 )
After you install the CA Application Performance Management (APM) 9 Agent, the JVM heap size keeps increasing. One finding in common with heap dumps for memory leaks is that the onejava.lang.ThreadLocal.ThreadLocalMap
object accounts for a huge amount of retained heap memory and this object holds a large array of java.net.ManagedSocketOutputStreamHighPerformance
objects.
Suggested Approach
The memory leak was triggered by the new socket-tracing enhancement provided by CA Application Performance Management (APM) 9 Agent.
To prevent the memory-leak performance problem, configure CA Wily Introscope to use the ManagedSocketTracing
setting with the CA APM 9 Agent instead of the default SocketTracing
setting until CA provides a patch to fix the default socket tracer.
- Find the CA APM-provided default setting:
TurnOn: SocketTracing
# NOTE: Only one of SocketTracing and ManagedSocketTracing should be 'on'. ManagedSocketTracing is provided to
# enable pre 9.0 socket tracing.#TurnOn: ManagedSocketTracing
- Change APM-provided default setting as shown here:
#TurnOn: SocketTracing
# NOTE: Only one of SocketTracing and ManagedSocketTracing should be 'on'. ManagedSocketTracing is provided to
# enable pre 9.0 socket tracing.
TurnOn: ManagedSocketTracing
Additional Information
Memory management revealed - Questions and answers
Performance guidance for production applications - JVMs
References
IBM Monitoring and Diagnostic Tools for Java - Memory Analyzer
Previous topic Troubleshooting Elasticsearch performance with TCP network analysis Next topic Troubleshooting: JVM hangs on conclusion cache read or update (HFix-10223 replaces HFix-8969)