Skip to main content


         This documentation site is for previous versions. Visit our new documentation site for current releases.      
 

Configuring the Stream service

Updated on July 5, 2022

This content applies only to On-premises and Client-managed cloud environments

Configure the Stream service to ingest, route, and deliver high volumes of data such as web clicks, transactions, sensor data, and customer interaction history.

Distribution and replication of the stream data records ensure scalability and fault tolerance of the Stream service. The service runs as a cluster on one or more servers.

Note: The Stream service can also be configured to run in External Mode. The advantage of that is the decoupling of Pega Platform from Kafka, and not having the nodes of type Stream. For more information on how to configure it, see Configuring External Kafka as a Stream service.

For additional guidelines regarding throughput, disk space requirements, and compression, see Best practices for Stream service configuration.

Stream node type

When planning your deployment, assign the Stream node type to at least two and at most four nodes in one Pega Platform cluster.

If you plan to have more than four Stream nodes, contact Global Customer Support to assist with your deployment.

Important:
  • Enable Stream nodes by configuring the node type as -DNodeType=Stream.
  • Do not mix Stream nodes with other node types.
  • Run a single JVM on a physical server to increase the resilience of the entire deployment.

For more information, see Assigning node types to nodes for on-premises environments.

Node identification

Each Pega Platform node is identified with a Node ID that must be unique in the cluster. If the same Node ID is already used in the cluster, the node fails to start.

Use this setting to more easily identify nodes and their purposes at a glance. A node ID is generated by default based on certain system setting values. However, as a best practice, set the node ID manually to reflect the node’s intended purpose. To set the node ID, use a JVM argument as shown in the following example:

-Didentification.nodeid=stream-node-1
Important: Preserve the node ID after a server restart so that you can identify it as a previously known node.

Data replication

Stream service replicates every record across a configurable number of servers. This replication allows automatic failover to these replicas when a server in the cluster fails so messages remain available in the presence of failures.

By default, the Stream service keeps two replicas of each record. In case you increase the number of Stream nodes from two to three or four, make sure you change the data replication setting to match the number of Stream nodes. You can do it by using prconfig on every Pega Platform node, or by using the following dynamic system setting:

Owning Ruleset
Pega-Engine
Setting Purpose
prconfig/dsm/services/stream/pyReplicationFactor/default
Value
<number_of_stream_nodes>

Data files location

By default, the Stream service stores its data in the java_ee_server_root/kafka-data folder. Change this location to a folder that you can monitor and secure against accidental data deletion.

Important: Do not use network attached storage or shared folders to store your stream data.

To change the default directory for a single server, in the prconfig.xml file, add the following entry:

<env name="dsm/services/stream/pyBaseLogPath" value="/data/kafka-data"/>

To change the default directory for all servers in the cluster, create a dynamic system setting with the following options:

Owning Ruleset
Pega-Engine
Setting Purpose
prconfig/dsm/services/stream/pyBaseLogPath/default
Value
/data/kafka-data

Ensure that you have at least 100 GB of disk space available to accommodate standard background processing activities.

Apache Kafka distribution location

When the Stream service is enabled in Pega Platform, the Apache Kafka distribution is unpacked in the following directory: java_ee_server_root/kafka-version

If you need to change the default location because it is secured against writing operations, you can do it in one of the following ways:

  • In the prconfig.xml file, add the following entry:
    <env name="dsm/services/stream/pyUnpackBasePath" value="/opt/kafka" />
  • Create a dynamic system setting with the following options:
    Owning Ruleset
    Pega-Engine
    Setting Purpose
    prconfig/dsm/services/stream/pyUnpackBasePath/default
    Value
    /opt/kafka

Operating system

Deploy the Stream service on Linux or any other Unix system.

Running Stream nodes on Windows might cause issues, and is not recommended in production environments.

The Stream service uses file descriptors for data files and open connections. Allow a limit of at least 100000 file descriptors. With a low descriptors count, the count limit might be exceeded causing the Stream service to fail. Check your operating system documentation on how to raise the ulimit.

Clock synchronization

Ensure that clocks on Stream nodes do not drift away and stay synchronized within a 30 seconds window. A very effective method of synchronizing clocks across all Pega Platform nodes is by using NTP.

Multiple JVMs on a single host

Do not run multiple Stream service JVMs on a single host. This reduces overall cluster resiliency and data availability in case the entire host fails.

However, in case such setup is required, you can do it by configuring dedicated, non-conflicting ports, for each Stream service JVM.

The Stream service uses three IP address and port pairs for internal communication. Assign a distinct set of ports for each JVM on a single host.

<!-- IP and port for communication between Pega nodes -->
<env name="dsm/services/stream/pyBrokerAddress" value="{IP_ADDRESS}"/>
<env name="dsm/services/stream/pyBrokerPort" value="9092"/>

<!-- IP and port for configuration management -->
<env name="dsm/services/stream/pyKeeperAddress" value="{IP_ADDRESS}"/>
<env name="dsm/services/stream/pyKeeperPort" value="2181"/>

<!-- Port for local Kafka management. Kafka JMX always runs on localhost --
>
<env name="dsm/services/stream/pyJmxPort" value="9999"/>

<!-- Port for HTTP streaming -->
<env name="dsm/services/stream/pyPort" value="7003"/>

JVM heap size

It is unlikely that you need to increase default JVM heap settings for your Stream service. However, if you need to do so, use the following settings:

  • Add an entry in the prconfig.xml file, for example:
    <env name="dsm/services/stream/pyHeapOptions" value="-Xmx4G -Xms4G" />
  • Create a dynamic system setting with the following options:
    Owning Ruleset
    Pega-Engine
    Setting Purpose
    prconfig/dsm/services/stream/pyHeapOptions/default
    Value
    For example: -Xmx4G -Xms4G

Garbage collector logs

The garbage collector removes discarded objects from the heap to free up allocation space. Garbage collector logs provide information about the memory cleaning process and help in identifying performance issues. By using the following settings, you can configure the name, count, and size for the log files.

  • Add an entry in the prconfig.xml file, for example:
    <env name="dsm/services/stream/pyGcLogOptions" value="-Xlog:gc*:file=kafkaServer-gc.log:time,tags:filecount=10,filesize=102400"/>
  • Create a dynamic system setting with the following options:
    Owning Ruleset
    Pega-Engine
    Setting Purpose
    prconfig/services/stream/pyGcLogOptions/default
    Value
    For example: -Xlog:gc*:file=kafkaServer-gc.log:time,tags:filecount=10,filesize=102400

JVM tuning

You can tune the garbage collection performance of the JVM. In the following example, the default Garbage-First (G1) garbage collector is selected and the maximum pause time for garbage collection is set to 20 milliseconds.

  • Add an entry in the prconfig.xml file, for example:
    <env name="dsm/services/stream/pyJvmPerformanceOptions" value="-server -XX:+UseG1GC -XX:MaxGCPauseMillis=20"/>
  • Create a dynamic system setting with the following options:
    Owning Ruleset
    Pega-Engine
    Setting Purpose
    prconfig/dsm/services/stream/pyJvmPerformanceOptions/default
    Value
    For example: -server -XX:+UseG1GC -XX:MaxGCPauseMillis=20"

Multiple availability zones

Spread your Stream nodes across multiple availability zones. To distribute data replicas evenly across availability zones, use the following settings to configure AZ names:

  • Add an entry in the prconfig.xml file, for example:
    <env name="dsm/services/stream/server_properties/broker.rack" value="AZ-1" />
  • Create a dynamic system setting with the following options:
    Owning Ruleset
    Pega-Engine
    Setting Purpose
    prconfig/dsm/services/stream/server_properties/broker.rack/default
    Value
    For example: AZ-1

General settings in the server.properties file

You can set the general settings in the server.properties file by using the following format:

<env name="dsm/services/stream/server_properties/property" value="value"/>

where:

  • property is the name of the property that you want to modify.
  • value is the value of that property.

Have a question? Get answers now.

Visit the Support Center to ask questions, engage in discussions, share ideas, and help others.

Did you find this content helpful?

Want to help us improve this content?

We'd prefer it if you saw us at our best.

Pega.com is not optimized for Internet Explorer. For the optimal experience, please use:

Close Deprecation Notice
Contact us