Skip to main content


         This documentation site is for previous versions. Visit our new documentation site for current releases.      
 

Monitoring the Stream service

Updated on July 5, 2022

Ensure that your Stream service operates without causing any errors by regularly monitoring the status of your Stream nodes, partitions, disk space, CPU usage, and database availability.

Note:

Starting in version 8.7, the use of an internal Kafka is deprecated. On-premises systems that have been updated from earlier versions of Pega Platform can continue to use Kafka in embedded mode. However, to ensure future compatibility, do not create any new environments using embedded Kafka. When configuring the Stream service in a new environment, use external Kafka.

When the Stream service is running in External Mode, the Stream landing page does not display any node details. Under normal conditions, only the NORMAL status is displayed, which means that Pega Platform is able to connect with External Kafka and use it for the Stream service.

Landing page

You can do a simple, manual status check of the Stream service by going to the Stream landing page.

In the header of Dev Studio, click ConfigureDecisioning InfrastructureServicesStream.

Stream landing page
Stream landing page in Dev Studio

Ensure that:

  • Status of each stream node is NORMAL.
  • There are no offline partitions.
  • There are no under-replicated partitions.

    You might see some under-replicated partitions from time to time, however, the value must stay 0 most of the time. If you constantly see under-replicated partitions, that might indicate that the Stream service is undersized. In this case, look at the CPU of your stream nodes. if CPU is averaging above 80%, consider scaling up your Stream nodes cluster.

  • There is exactly one node for which the Is controller parameter is set to 1.

    If you see no controller nodes, or there is more than one controller node, perform a rolling restart of Stream nodes. A controller node is responsible for managing the states of partitions, replicas and for performing administrative tasks like reassigning partitions in case of a failure. For more information, see Operating the Stream service.

  • The total number of partitions does not exceed 1,000.

Maximum number of partitions per broker

There is a known limit on the number of partitions per broker. The limit depends on the configuration of the machine that the broker is running on. As a general rule, limit the number of partitions per broker to 1,000. When more than 1,000 partitions per broker are configured, issues with the synchronization of partition replicas might occur and lead to under-replicated partitions. The limit of 1,000 partitions is valid for the internal Kafka in Pega Cloud. For the specific numbers in other environment types, see the following table:

Maximum supported partitions per broker in relation to CPU and memory

Machine typeMaximum supported partitions per broker
CPU (core): 2, Memory (GiB): 2300
CPU (core): 2, Memory (GiB): 81,000
CPU (core): 4, Memory (GiB): 161,000
CPU (core): 8, Memory (GiB): 322,000
CPU (core): 16, Memory (GiB): 64 and up4,000

Health check

Monitor Pega Platform instances by associating a health check with every node in a cluster. Take unhealthy nodes out of the service and restart them.

For more information, see Verifying that an instance is running.

Disk space

The Stream service uses the disk to store data. Monitor disk usage to prevent running out of disk space. If a Stream node runs out of disk space, it stops functioning until space is available again. This situation leads to data loss and overall service instability.

Monitor the available disk space and take actions, for example, increase disk space, when the availability of free space drops below 30%.

CPU

Monitor the CPU usage on every Stream service node. If the CPU average usage is above 70%, consider provisioning a bigger node.

Database

The Stream service relies on the availability of the Pega Platform database. A slow or unavailable database might have a serious impact on the Stream service.

Monitor queries to the following tables:

  • pr_data_stream_sessions
  • pr_data_stream_nodes
  • pr_data_stream_node_updates

If queries take more than 1 second, consider tuning your database.

In the case of database unavailability, planned or unplanned, consider restarting your stream nodes, especially if you see them being unhealthy.

File descriptors

If you encounter the following issues while monitoring the stream service:

  • Stream nodes are getting terminated.
  • Stream node terminates and fails to start.
  • The number of file descriptors in use increases constantly.
    Note: This could be a side effect of partition count mismatch between DataFlow and Kafka.
Get the Kafka server logs and look for following exception to confirm that the stream service fails due to file descriptor limit: ``` 2022-02-16 10:37:14,737] ERROR Error while accepting connection (kafka.network.Acceptor) java.io.IOException: Too many open files at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method) ~[?:1.8.0_144] at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:422) ~[?:1.8.0_144] at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:250) ~[?:1.8.0_144] at kafka.network.Acceptor.accept(SocketServer.scala:414) ~[kafka_2.11-1.1.0.9.jar:?] at kafka.network.Acceptor.run(SocketServer.scala:357) [kafka_2.11-1.1.0.9.jar:?] at java.lang.Thread.run(Thread.java:748) [?:1.8.0_144] ```

Have a question? Get answers now.

Visit the Support Center to ask questions, engage in discussions, share ideas, and help others.

Did you find this content helpful?

Want to help us improve this content?

We'd prefer it if you saw us at our best.

Pega.com is not optimized for Internet Explorer. For the optimal experience, please use:

Close Deprecation Notice
Contact us