Configuring the Stream service
This content applies only to On-premises and Client-managed cloud environments
Configure the Stream service to ingest, route, and deliver high volumes of data such as web clicks, transactions, sensor data, and customer interaction history.
Distribution and replication of the stream data records ensure scalability and fault tolerance of the Stream service. The service runs as a cluster on one or more servers.
For additional guidelines regarding throughput, disk space requirements, and compression, see Best practices for Stream service configuration.
Stream node type
When planning your deployment, assign the Stream node type to at least two and at most four nodes in one Pega Platform cluster.If you plan to have more than four Stream nodes, contact Global Customer Support to assist with your deployment.
For more information, see Assigning node types to nodes for on-premises environments.
Node identification
Each Pega Platform node is identified with a Node ID that must be unique in the cluster. If the same Node ID is already used in the cluster, the node fails to start.
Use this setting to more easily identify nodes and their purposes at a glance. A node ID is generated by default based on certain system setting values. However, as a best practice, set the node ID manually to reflect the node’s intended purpose. To set the node ID, use a JVM argument as shown in the following example:
-Didentification.nodeid=stream-node-1
Data replication
Stream service replicates every record across a configurable number of servers. This replication allows automatic failover to these replicas when a server in the cluster fails so messages remain available in the presence of failures.
By default, the Stream service keeps two replicas of each record. In case you increase the number of Stream nodes from two to three or four, make sure you change the data replication setting to match the number of Stream nodes. You can do it by using prconfig on every Pega Platform node, or by using the following dynamic system setting:
- Owning Ruleset
- Pega-Engine
- Setting Purpose
- prconfig/dsm/services/stream/pyReplicationFactor/default
- Value
- <number_of_stream_nodes>
Data files location
By default, the Stream service stores its data in the
java_ee_server_root/kafka-data
folder. Change this location to a folder that you can monitor and secure against
accidental data deletion.
To change the default directory for a single server, in the
prconfig.xml
file, add the following entry:
<env name="dsm/services/stream/pyBaseLogPath" value="/data/kafka-data"/>
To change the default directory for all servers in the cluster, create a dynamic system setting with the following options:
- Owning Ruleset
- Pega-Engine
- Setting Purpose
- prconfig/dsm/services/stream/pyBaseLogPath/default
- Value
- /data/kafka-data
Ensure that you have at least 100 GB of disk space available to accommodate standard background processing activities.
Apache Kafka distribution location
When the Stream service is enabled in Pega Platform, the Apache
Kafka distribution is unpacked in the following directory:
java_ee_server_root/kafka-version
If you need to change the default location because it is secured against writing operations, you can do it in one of the following ways:
- In the
prconfig.xml
file, add the following entry:<env name="dsm/services/stream/pyUnpackBasePath" value="/opt/kafka" />
- Create a dynamic system setting with the following options:
- Owning Ruleset
- Pega-Engine
- Setting Purpose
- prconfig/dsm/services/stream/pyUnpackBasePath/default
- Value
- /opt/kafka
Operating system
Deploy the Stream service on Linux or any other Unix system.
Running Stream nodes on Windows might cause issues, and is not recommended in production environments.
The Stream service uses file descriptors for data files and open connections. Allow a limit of at least 100000 file descriptors. With a low descriptors count, the count limit might be exceeded causing the Stream service to fail. Check your operating system documentation on how to raise the ulimit.
Clock synchronization
Ensure that clocks on Stream nodes do not drift away and stay synchronized within a 30 seconds window. A very effective method of synchronizing clocks across all Pega Platform nodes is by using NTP.
Multiple JVMs on a single host
Do not run multiple Stream service JVMs on a single host. This reduces overall cluster resiliency and data availability in case the entire host fails.
However, in case such setup is required, you can do it by configuring dedicated, non-conflicting ports, for each Stream service JVM.
The Stream service uses three IP address and port pairs for internal communication. Assign a distinct set of ports for each JVM on a single host.
<!-- IP and port for communication between Pega nodes -->
<env name="dsm/services/stream/pyBrokerAddress" value="{IP_ADDRESS}"/>
<env name="dsm/services/stream/pyBrokerPort" value="9092"/>
<!-- IP and port for configuration management -->
<env name="dsm/services/stream/pyKeeperAddress" value="{IP_ADDRESS}"/>
<env name="dsm/services/stream/pyKeeperPort" value="2181"/>
<!-- Port for local Kafka management. Kafka JMX always runs on localhost --
>
<env name="dsm/services/stream/pyJmxPort" value="9999"/>
<!-- Port for HTTP streaming -->
<env name="dsm/services/stream/pyPort" value="7003"/>
JVM heap size
It is unlikely that you need to increase default JVM heap settings for your Stream service. However, if you need to do so, use the following settings:
- Add an entry in the
prconfig.xml
file, for example:<env name="dsm/services/stream/pyHeapOptions" value="-Xmx4G -Xms4G" />
- Create a dynamic system setting with the following options:
- Owning Ruleset
- Pega-Engine
- Setting Purpose
- prconfig/dsm/services/stream/pyHeapOptions/default
- Value
- For example: -Xmx4G -Xms4G
Garbage collector logs
The garbage collector removes discarded objects from the heap to free up allocation space. Garbage collector logs provide information about the memory cleaning process and help in identifying performance issues. By using the following settings, you can configure the name, count, and size for the log files.
- Add an entry in the
prconfig.xml
file, for example:<env name="dsm/services/stream/pyGcLogOptions" value="-Xlog:gc*:file=kafkaServer-gc.log:time,tags:filecount=10,filesize=102400"/>
- Create a dynamic system setting with the following options:
- Owning Ruleset
- Pega-Engine
- Setting Purpose
- prconfig/services/stream/pyGcLogOptions/default
- Value
- For example: -Xlog:gc*:file=kafkaServer-gc.log:time,tags:filecount=10,filesize=102400
JVM tuning
You can tune the garbage collection performance of the JVM. In the following example, the default Garbage-First (G1) garbage collector is selected and the maximum pause time for garbage collection is set to 20 milliseconds.
- Add an entry in the
prconfig.xml
file, for example:<env name="dsm/services/stream/pyJvmPerformanceOptions" value="-server -XX:+UseG1GC -XX:MaxGCPauseMillis=20"/>
- Create a dynamic system setting with the following options:
- Owning Ruleset
- Pega-Engine
- Setting Purpose
- prconfig/dsm/services/stream/pyJvmPerformanceOptions/default
- Value
- For example: -server -XX:+UseG1GC -XX:MaxGCPauseMillis=20"
Multiple availability zones
Spread your Stream nodes across multiple availability zones. To distribute data replicas evenly across availability zones, use the following settings to configure AZ names:
- Add an entry in the
prconfig.xml
file, for example:<env name="dsm/services/stream/server_properties/broker.rack" value="AZ-1" />
- Create a dynamic system setting with the following options:
- Owning Ruleset
- Pega-Engine
- Setting Purpose
- prconfig/dsm/services/stream/server_properties/broker.rack/default
- Value
- For example: AZ-1
General settings in the server.properties file
You can set the general settings in the server.properties
file
by using the following format:
<env name="dsm/services/stream/server_properties/property" value="value"/>
where:
- property is the name of the property that you want to modify.
- value is the value of that property.
Previous topic Deploying and operating the Stream service Next topic Best practices for Stream service configuration