Data Flow service

Pega Platform uses the Data Flow service to run data flows. Data flows are rules for sequencing and combining data (based on various sources) and writing the results to a destination.

Except for test runs, you must configure the Data Flow service before you can start a data flow. In test runs, the data flows always run on the local decision data node. The Data Flow tab on the Services landing page lists the nodes where data flow instances are run.

The Data Flow service is divided into Batch and Real Time services to better handle different types of data flow runs. When you run a data flow, select whether you want to run the data flow on the nodes in the Batch or Real Time service. Add or remove nodes to increase or decrease the use of these services. For example, you can add more node when you plan batch runs that require data-intensive computing. Data processing operations on the Batch nodes and the Real Time nodes are independent and do not affect each other.

Depending on the partitioning configuration of data flow instances, the data flow can process data on a different number of nodes than that listed on the Data Flow tab. On that tab, you can also configure the number of Pega Platform threads that you want to use for running the data flow instances.