Skip to main content


         This documentation site is for previous versions. Visit our new documentation site for current releases.      
 

Creating a real-time run for data flows

Updated on July 5, 2022

Provide your decision strategies with the latest data by creating real-time runs for data flows with a streamable data set source, for example, a Kafka data set.

Before you begin:
  1. Start the Data Flow service.

    For more information, see Configuring the Data Flow service.

  2. Check-in the data flow that you want to run.

    For more information, see Rule check-in process.

  1. In the header of Dev Studio, click ConfigureDecisioningDecisionsData FlowsReal-time Processing.
  2. On the Real-time processing tab, click New.
  3. On the New: Data Flow Work Item tab, associate a Data Flow rule with the data flow run:
    1. In the Applies to field, press the Down arrow key, and then select the class to which the Data Flow rule applies.
    2. In the Access group field, press the Down arrow key, and then select an access group context for the data flow run.
    3. In the Number of threads field, enter the number of threads to use per data flow node.
    4. In the Data flow field, press the Down arrow key, and then select the Data Flow rule that you want to run.
      The class that you select in the Applies to field limits the available rules.
    5. In the Service instance name field, select Batch.
    6. In the Priority list, select the importance level for the run.
      For more information, see Data flow run priorities.
  4. Optional: To keep the run active and to restart the run automatically after every modification, specify the following settings:
    1. Select the Manage the run and include it in the application check box.
    2. In the Ruleset field, press the Down arrow key, and then select a ruleset that you want to associate with the run.
    3. In the Run ID field, enter a meaningful ID to identify the data flow run.
    Result: When you move the ruleset between environments, the system moves the run with the ruleset to the new environment and keeps it active.
  5. Optional: In the Additional processing section, specify any activities that you want to run before and after the data flow run.
  6. In the Resilience section, specify an error threshold for the data flow run. In the Fail the run after more than x failed records field, enter an integer greater than 0.
    After the number of failed records reaches or exceeds the threshold that you specify, the run stops processing data and the run status changes to Failed. If the number of failed records does not reach or exceed the threshold, the run continues to process data, and the run status then changes to Completed with failures.
  7. In the Node failure section, specify how you want the run to proceed in case the node becomes unreachable:
    • To resume processing records on the remaining active nodes, from the last processed record that is captured by a snapshot, select Resume on other nodes from the last snapshot. If you enable this option, the run can process each record more than once.

      This option is available only for resumable data flow runs.

    • To resume processing records on the remaining active nodes from the first record in the data partition, select Restart the partitions on other nodes. If you enable this option, the run can process each record more than once.

      This option is available only for non-resumable data flow runs.

    • To terminate the data flow run and change the run status to Failed, select Fail the entire run.

      This option provides backward compatibility with previous Pega Platform versions.

    The available options depend on the type of data flow run.

    For more information about resumable and non-resumable data flow runs and their resilience, see the Data flow service overview article on Pega Community.

  8. For resumable data flow runs, in the Snapshot management section, specify how often you want the Data Flow service to take snapshots of the last processed record from the data flow source.
    If you set the Data Flow service to take snapshots more frequently then you increase the chance of not repeating record processing, but you can also lower system performance.
  9. If your data flow references an Event Strategy rule, configure the state management settings:
    1. Expand the Event strategy section.
    2. Optional: To specify how you want the incomplete tumbling windows to act when the data flow run stops, in the Event emitting section, select one of the available options.
      By default, when the data flow run stops, all the incomplete tumbling windows in the Event Strategy rule emit the collected events. For more information, see Event Strategy rule form - Completing the Event Strategy tab.
    3. In the State management section, specify how you want the Data Flow service to process data from event strategies:
      • To keep the event strategy state in running memory and write the output to a destination when the data flow finishes its run, select Memory.

        If you select this option, the Data Flow service processes records faster, but you can lose data in the event of a system failure.

      • To periodically replicate the state of an event strategy in the form of key values to the Cassandra database that is located in the Decision Data Store, select Database.

        If you select this option, you can fully restore the state of an event strategy after a system failure, and continue processing data.

    4. In the Target cache size field, specify the maximum size of the cache for state management data.
      The default value is 10 megabytes.
  10. Click Done.
    Result: The system creates a real-time run for your data flow and opens a new tab with details about the run. The run does not start yet.
  11. Click Start.
    Result: The real-time data flow run starts.
  12. Optional: To analyze a life cycle during or after a run, and troubleshoot potential issues review the life cycle events:
    1. On the Data flow run tab, click Run details.
    2. On the Run details tab, click View Lifecycle Events.
      Result: The system opens a new window with a list of life cycle events. Each event has a list of assigned details, for example, reason. For more information, see Event details in data flow runs on Pega Community.
      Note: By default, Pega Platform displays events from the last 10 days. You can change this value by editing the dataflow/run/lifecycleEventsRetentionDays dynamic data setting.
    3. Optional: To export the life cycle events to a single file, click Actions, and then select a file type.

Have a question? Get answers now.

Visit the Support Center to ask questions, engage in discussions, share ideas, and help others.

Did you find this content helpful?

Want to help us improve this content?

We'd prefer it if you saw us at our best.

Pega.com is not optimized for Internet Explorer. For the optimal experience, please use:

Close Deprecation Notice
Contact us