Configuring the Data Flow service

In the Data Flow service, you can run data flows in batch mode or real time (stream) mode. Specify the number of Pega Platform threads that you want to use for running data flows in each mode.

Note: This procedure applies only to on-premises deployments.
Before you begin: Assign the Stream and Batch node types to Pega Platform nodes. To scale the Data Flow service horizontally, assign the corresponding node type to a higher number of nodes.

For more information, see Assigning node types to nodes for on-premises environments.

  1. In the header of Dev Studio, click Configure > Decisioning > Infrastructure > Services > Data flow.
  2. In the Service list, select the node types for which you want to configure the number of threads.
    Batch nodes process batch data flow runs. Real-time nodes process streaming data flows.
  3. In the Data flow nodes section, click Edit settings.
  4. In the Thread count field, enter the number of threads that you want to use for running data flows in the selected mode.
    To scale the Data Flow service vertically, increase the current number of threads.
    For example: If you divide the source of a data flow into five partitions, Pega Platform divides the data flow run into five assignments, and then processes the assignments simultaneously on separate threads, if five threads are available.

    Pega Platform calculates the number of available threads by multiplying the thread count by the number of nodes. For example, with two nodes and the thread count set to 5, the data flow run uses five threads and five threads remain idle.

  5. Click Submit.