Identify health issues with your Stream service by reviewing the data tables that provide information about Kafka sessions and clusters.
Data-Decision-StreamService-Session (pr_data_stream_sessions) represents the currently active Kafka sessions.
- Unique session ID
- Session timeout. Sessions with the last seen date time older than the timeout are considered expired.
- Start time
- Session start time as a Unix time stamp.
- Last seen date time
- Unix time stamp of the last heart beat.
- Node ID
- Identifier of the node which originated the session (Pega Platform node ID).
This table should contain one row per stream service node. Monitoring this table can help in identifying health issues. Under normal circumstances, the content of the table is static and changes only when Stream nodes are decommissioned or restarted.
If rows in the table do not match the number of Stream nodes, or session IDs change frequently, one of the following problems might be the cause:
- Pega Platform is unable to connect to the Pega database. Kafka keeps sending session ping and connect requests, but since the system is not able to write to the pr_data_stream_sessions table, all these requests are rejected. This problem is typically easy to detect as logs are filled with multiple database connection issues.
- Database is too slow. For example, if it takes more than 10 seconds to read or write to sessions table, then sessions are invalidated frequently demonstrating a very erratic behavior. Pega Platform may seem to be functional, though very slow, but the Stream service is constantly going up and down.
- Clock drift with multiple nodes in a cluster having different time. To resolve this problem, configure NTP for every installation.
Data-Decision-StreamService-Node (pr_data_stream_nodes) contains all meta-information about the Kafka cluster:
- List of all known Stream nodes
- List of all topics and their configuration
- Data partition distribution across Stream nodes including replicas
- Current controller node.
Kafka automatically elects one node as a controller, which is responsible for managing the states of partitions and replicas, and for performing administrative tasks, such as reassigning partitions in case of a failure. Only one node at a time can be a controller node.
The table is expected to have 500-5000 rows depending on the number of topics. It replaces Apache Zookeeper for Kafka.
Data-Decision-StreamService-NodeUpdate (pr_data_stream_updates) keeps the list of recent updates to the pr_data_stream_nodes table. The table does not play a significant role in Stream service operations.
Active-active and active-passive modes
If you run the Pega Platform active-active or active-passive mode with two Pega databases replicated through Oracle GoldenGate or similar technology, do not replicate pr_data_stream_* tables. Replicating these tables causes inconsistencies and eventual data loss.