Skip to main content

         This documentation site is for previous versions. Visit our new documentation site for current releases.      

This content has been archived and is no longer being updated.

Links may not function; however, this content may be relevant to outdated versions of the product.

Partition keys in Stream Data Set rules

Updated on August 31, 2018

Beginning with Pega7.3, you can define a set of partition keys when you create a Data Set rule of type Stream. Setting partition keys in a data set is useful for analyzing data across multiple nodes and helps you ensure that all related records are grouped together.

Define partitioning only for testing purposes, that is, in application environments in which the Production level system setting is set to 1, 2, or 3. If you change the Production level setting to 4 or 5, any data set of type Stream that has at least one property defined as a partition key stops being distributed across multiple nodes. In production-level applications (above level 3), you can distribute the processing of data from stream data sets across multiple nodes only by using your own custom setup (for example, by sending load-balancing requests to the node cluster, and so on).
Any change in the production level takes effect after you restart the system.

Setting production level

Setting the production level in an application

You can use the properties defined in the Applies To class of the Data Set rule as partition keys. Additionally, if the Data Flow rule (for which the stream data set is the source) references an Event Strategy rule, you can define only a single partition key. That partition key must be the same as the event key that you defined in the Real-time Data shape on the Event Strategy form.

Defining partition keys for an event stream

Defining partition keys for an event stream

Active data flows that reference stream data sets that have at least one partition key defined continue processing when the node topology changes, for example, if a node fails or a node is removed from the cluster. Such a data flow adjusts to the change in the number of Data Flow service nodes, but the data that was not yet processed on the failed or disconnected node is lost.

For more information, see Defining partition keys for stream data sets.

Have a question? Get answers now.

Visit the Support Center to ask questions, engage in discussions, share ideas, and help others.

Did you find this content helpful?

Want to help us improve this content?

We'd prefer it if you saw us at our best. is not optimized for Internet Explorer. For the optimal experience, please use:

Close Deprecation Notice
Contact us