Support Play: Troubleshooting agent and service activity performance
Summary
Some performance issues occur when an agent or a service runs. Analyzing such performance problems differs from analyzing an interactive user's performance issue. (For performance analysis for interactive users , see Support Play: A methodology for troubleshooting performance.)
Quick Links
Troubleshooting with PAL Tools
Adding Troubleshooting Steps to an Existing Activity
- PAL Details using the getPAL method
- PAL Details using the PALLog RuleSet
- DB Trace - Using the SetRequestorLevel DBTrace Activity
Troubleshooting Process Using PALLog
Suggested Approach
Use this support play after you have followed steps from the Troubleshooting Performance support play and determined that a performance problem is in an agent or service.
This article describes special ways to run PAL or DB Trace to capture information about agent or service performance. See Troubleshooting Performance for the actual analysis of the captured data.
You can run PAL on any requestor, including agent requestors and service requestors. However, a difficulty when using PAL on an agent or service interaction is that their activities may execute so quickly that you cannot catch the data you want before it is replaced by newer data. As a result, it may be necessary to add custom steps into the agent or service activities to catch the PAL data.
Unlike PAL, you cannot run the DB Trace tool directly on agents or services requestors. The only way to get a DB Trace output for these interactions is to add custom steps as described below into the activities.
Troubleshooting with PAL Tools
The Performance tool (PAL, or Performance AnaLyzer) provides a collection of counters and timer readings that you can use to analyze some performance issues in a system. This tool captures the information necessary to identify processing inefficiencies or excessive use of resources in your application and stores this data in “PAL counters” or “PAL readings” for onerequestor or node.
Use PAL to gain insight into where the system is spending resources and identify areas of poor performance, such as delays in processing, refreshing screens, submitting work objects, or other application functions.. For details on how to take and analyze PAL readings, see:
- Using Performance Tools in Process Commander Version 4.2
- Using Performance Tools in Process Commander Version 5.1
Performance statistics for all requestors in the node are available through other tools. Depending upon your system version, use one of these tools:
- V4.2: System Console/Monitor Servlet
- V5.1: System Management Application
The first step in analyzing agent performance is to select the requestor for the agent or service in question. The Name of any requestor shows its hash name.
- If the name begins with “H”, this requestor is being used by a user (HTTP interaction)
- If the name begins with “A”, this requestor is being used by a listener or service rule
- If the name begins with “B”, this is a batch requestor (used by agent processing)
In addition, the Client Address column shows the IP address of the computer that is sending information to the requestor. For agents, the “address” is a label (“Master Agent,” “Usage Daemon,” etc.).
Using the System Console/Monitor Servlet (Version 4.2)
The System Console has a Requestor Status page, which shows all the requestors running on that node. Once the appropriate requestor has been identified, click the clock icon to the left of the requestor.
The PAL Detail window appears for this requestor:
Note that the PAL counters on this window are be in a different order than on the PAL Detail window available in the portal, and are labeled slightly differently; however, the properties tracked will be the same.
Using the System Management Application (Version 5.1)
The System Management Application has a Requestor Management page, which displays information about any of the requestors running on this node of the system. When the appropriate agent requestor has been identified, click the radio button next to that requestor, and then click the Performance Details button at the top of the window.
The PAL Detail window appears for this requestor:
For details on the System Management Application in 5.1, see the System Management Reference Guide.
Adding Troubleshooting Steps to an Existing Activity
As stated above, when troubleshooting agent or services performance, you may be unable to catch the activity run in the PAL window, as the process may run quite fast; in addition, DB Trace is not available at all for running during an activity. Therefore, it may be more efficient to add troubleshooting steps inside the activity itself, to gather more data on possible problems.
Troubleshooting Tools
The tools that you can add as steps in an activity include:
- getPAL Java method
- PALLog activities
- SetRequestorLevelDBTrace activity
PAL Details using the getPAL Method
The getPAL method allows you to capture PAL information by making the PAL data collection part of the actual agent activity.
getPAL is a method found in the PublicAPI class:
The PAL class has several Java methods, including:
- getStats
- clearStats
You can call these methods from a Java step in any activity .
For an example, inspect the code in the standard activity named Code-Pega-PAL.PALDataGet. The first Java step in that activity calls the getStats method. Other activities defined on the Code-Pega-PAL class demonstrate how to instrument specific processes in an application to capture specific performance data.
PAL Details using the PALLog RuleSet
If you want to avoid coding Java steps, download the PALLog RuleSet ( 24136_PALLog.ZIP). This RuleSet contains two activities:
- ClearPALData
- SavePALData
These activities call the getPAL methods, and may be used to gather the PAL data inside agent or service activities.
ClearPALData
To use this activity, add a step at the beginning of your own activity to call the ClearPALData activity, to clear the PAL statistics. This step is analogous to clicking the Reset Data link in the standard user tracing steps.
SavePALData
Similar, add a step to your activity that calls the SavePALData activity at those places where you want to save the PAL reading values. The SavePALData activity takes two parameters:
- SnapShotName – the name of a PAL snapshot file
- PALFilePath – the path to which the file will be written on the server
Important: Some standard agent activities run frequently (once every 30 seconds or so). Placing these troubleshooting steps into an agent activity may result in a number of PAL data files being created. To give the files unique names to prevent them overwriting each other, you can include a timestamp in the Snapshotname parameter:
"PALforSLA_" + Lib(Pega-RULES:DateTime).CurrentDateTime()
This guarantees that the filenames for multiple PAL readings will be unique. For example:
- PALforSLA_20061115T100602_B0D643C767a0824A580D522AF379DC85F4.log
- PALforSLA_20061115T100634_B3824DVED4C8864E593C721540098C58E.log
- PALforSLA_20061115T100704_B03E086F25981219891A4CEDC40329FD7.log
(The final portion of these file names is the requestor ID hash code, which is probably but not certainly unique; thus, the timestamp is added.)
Each time this activity is called to take PAL readings during the running of an agent or service activity, it creates a file containing the detail PAL properties and their values. This will contain the same type of data that you see when you click the Save Data link from the summary PAL display:
Depending upon how many times this activity is called , the PAL data collected is cumulative. For example, an activity under investigation has 9 steps. To instrument it, four steps are added. :
- call ClearPALData
- (original) Step 1
- Step 2
- Step 3
- call SavePALData (first)
- Step 4
- Step 5
- Step 6
- call SavePALData (second)
- Step 7
- Step 8
- Step 9
- call SavePALData (third)
The above “activity” results in PAL measurements for the first three original steps from the first call to SavePALData. The second call would report PAL statistics for the first six steps; and the third call to SavePALData would measure the data for all 9 steps.
These measurements can be broken down further:
- call ClearPALData
- (original) Step 1
- Step 2
- Step 3
- call SavePALData (first)
- call ClearPALData
- Step 4
- Step 5
- Step 6
- call SavePALData (second)
- call ClearPALData
- Step 7
- Step 8
- Step 9
- call SavePALData (third)
In the above case, the first call to SavePALData would result in PAL measurements for the first three steps. Since the PAL data is then cleared, the second call would result in measurements for original steps 4 through 6 only, and the final run of SavePALData would measure data for original steps 7 through 9.
DB Trace - Using the SetRequestorLevelDBTrace Activity
The property .pyDBTraceEnabled ( on the pxRequestor page) determines whether database tracing is enabled for a particular requestor.
NOTE: This property value has no effect when Global DB Trace is enabled. Global DB Trace enables DB Trace on all requestors in the system, and is best used for system-wide problems.
To set the .pyDBTraceEnabled property, call the standard activity Code-Pega-Requestor.SetRequestorLevelDBTrace.
Pass a boolean parameter enabled to start (true) or end (false) DB Tracing.
When DB Trace runs, it create a DB Trace text file like the one created directly through the interactive Performance tools, which you can then analyze (as explained in Troubleshooting Performance). The name of this file has several parts:
- user ID
- hash value
- date/timestamp
Example:
WorkUser_AcmeCo.com_F8D6EFA61117A446D2467AB669B352D3_20070227T192928_938_GMT.txt
NOTES:
- In this example, the user ID is [email protected]. Instead of using the “@” symbol in the file name (which could cause problems), an underscore (“_”) was substituted.
- Unlike the PAL activities, you cannot direct the DB Trace data to a specific directory. The DB Trace data is always stored in the ServiceExport directory. (The exact location of this directory varies depending upon your application server. For example, the Apache Tomcat path is /contextRoot/work/Catalina/localhost/prweb/StaticContent/global/ServiceExport where contextRoot is the path defined for this application.)
Troubleshooting Process Using PALLog
Application Agent
The process for troubleshooting agent performance using the PALLog activities is as follows:
- Make a new copy of the activity in another RuleSet or RuleSet version.
- Add a step at the beginning of the activity to start DB Trace.
- Add a step at the beginning of the activity to clear the PAL statistics.
- Add steps as desired in the middle of or at the end of the activity to capture new PAL statistics.
- Add a step at the end of the activity to stop DB Trace.
- Run the agent that calls this activity.
- Review the generated performance data
In the following example, a copy of the the standard activity Assign-.ProcessServiceLevelEvents instrumented with additional steps.
As shown above, a new first step added to the example activity starts the DB Trace, by calling the SetRequestorLevelDBTrace activity. The parameter enabled is set to checked (true), to start DB Tracing..
Next, the developer adds a step 2 that clears the PAL statistics.
Steps 3 through 9 in this example activity are the unaltered. The developer adds Step 10 at the end of the processing to write the PAL statistics into to a file.
- The SnapShotName parameter contains the name of the PAL snapshot file, and includes the timestamp to give the file a unique name.
- The PALFilePath identifies a directory where this file will be created.
Finally, the developer adds Step 11 at the end of the activity to end the DB Trace, clearing the enabled box, and saves the Activity form.
After the activity is edited, the developer runs the process that calls the activity.
IMPORTANT: Before running the new edited , the update the agent access group to make sure the agent has access to the new activity.
After the activity runs, you can retrieve the output files that were created and analyze them
Finally, If you edited an existing application activity, at the end of this process remember to either comment out the troubleshooting steps added above, or delete them.
Pega Agent
It’s possible that the performance issue isn’t in an agent which is part of your application, but is in one of the standard Pega-****- RuleSets. In this case, you can't update the agent activity to add new steps, because it belongs to a locked RuleSet Version.
To instrument standard activities, the procedure has a few additional twists:
- Create a copy of the standard agent activity
- Create a copy of the agent’s Rule-Agent-Queue instance
- Add steps as above
- Disable the standard agent
- Run the new agent Review the generated performance data
Create a copy of the Pega agent activity
If the agent activity is not set to FINAL availability, make a copy of the activity and save it with the same name into a higher “working” RuleSet in the RuleSet List (perhaps an open custom RuleSet, or the developer’s troubleshooting RuleSet), so that it will get chosen by rule resolution. If it is possible to copy the activity and keep the same name, then it is not necessary to change anything else – rule resolution will make sure this new activity is used.
Important: When making a copy of the activity, make sure that the RuleSet the activity is saved in is accessible to the agent! Check the agent’s access group, and if this RuleSet is not part of the access group, add it.
If the agent activity is set to Final, and so can’t be overridden, then you can save it into a higher RuleSet with a slightly different name. In this case, it is necessary to disable the original agent, in either of two ways:
- Disable the agent in the Monitor Servlet (for Version 4.2) or the System Management Application (Version 5.1).
- Find the Data-Agent-Queue instance which contains the standard agent activity, and disable that agent activity.
After you disable the standard activity , add the new activity (with the new name) to one of the custom agents for the application. Again, make certain that this activity is in a RuleSet which is accessible to the agent (through its access group).
Follow the procedure above for editing the activity, running the agent, and reviewing the data.
REMEMBER: After the troubleshooting is completed, re-enable the original agent, and disable or delete the new activity with the troubleshooting steps.
Service - Version 4.2
Version 4.2 contains specific tools for troubleshooting services. The best method for getting standard PAL data for services is o run PAL from the service activity, using the same process as described above for an agent activity.
Service - Version 5.1
Version 5.1 includes a number of PAL statistics that provide further information about service interactions:
PAL Label | Description |
---|---|
CPU time to process parse rules | When mapping data, rules of the following rule types may be used:
This reading measures CPU time spent processing parse rules. If this measurement is over .5 seconds, review the data being parsed to see if there are issues (for example, a problem with the data structures in the file, or a change in the structure for some of the records). |
Elapsed time to process parse rules | This reading measures the elapsed (total) time spent processing the Parse rules. |
Number of parse rules | This reading counts the number of parse rules executed. |
CPU Inbound Mapping Time | Whenever a Rule-Service- rule receives a request, data must be mapped from that request to Process Commander properties. This reading measures the CPU time spent mapping the inbound data. |
Elapsed Inbound Mapping Time | This reading measures the elapsed (total) time spent mapping the inbound data for a Rule-Service request. |
CPU Outbound Mapping Time | Whenever a Rule-Service receives and processes a request, the response data must be mapped from properties to the form the external system expects. This reading measures the CPU time spent mapping the outbound data. |
Elapsed Outbound Mapping Time | This reading measures the elapsed (total) time spent mapping the outbound data for a response to an external system request. |
CPU Activity Time | Whenever a Rule-Service receives and processes a request, after the data is mapped for the response, the system runs a “service” activity. This reading measures the CPU time spent running a service activity. |
Elapsed Activity Time | This reading measures the elapsed (total) time spent when the system runs a service activity. |
Number of records in file | This reading counts the number of records in files processed by File Listeners. |
Number of Bytes received by the Server through Services | This reading displays the amount of data received by the server through a service request, measured in Kbytes. |
These PAL counters provide a detailed picture of where time is spent during a service interaction.
For full details on how these can be used to troubleshoot services, see Testing Services and Connectors in Version 5.1.
Additional Resources
- Support Play: A methodology for troubleshooting performance
- Troubleshooting Agents
- Using Performance Tools in Process Commander Version 4.2
- Using Performance Tools in Process Commander Version 5.1
Need Further Help?
If you have followed this Support Play, but require additional help, contact Global Customer Support by logging a Support Request.
Previous topic HFix-5977 delivers Conclusion Cache optimizations and related core engine updates Next topic Troubleshooting Elasticsearch performance with TCP network analysis