Skip to main content


         This documentation site is for previous versions. Visit our new documentation site for current releases.      
 

The use of streaming data sets in data flows

Updated on July 5, 2022

The configuration of a streaming data set, such as Kafka, Kinesis, or Stream, can impact the life cycle of the records consumed by the data flow run that utilizes these data sets. You can use the following information to prevent duplicate processing of records or loss of records during data flow runs using these data sets.

Users have the option to select Only read new records or Read existing and new records for Read options when configuring streaming data sets as the data flow source.

These options change the behavior of the streaming data set when starting a data flow run from the beginning, whether by creating a new data flow run or restarting an existing one.

When users select Read existing and new records, the data flow run starts reading any records that exist in the data set.

When users select Only read new records, the data flow run discards any existing records and only reads records that appear in the data set after the data flow run is in the In-progress status.

The figure below shows an example of a Source configurations window.

A Stream data set configured as a source
A Stream data set is selected as the source. The Only read new records checkbox is selected below.

Users can choose to Start, Stop, Continue, or Restart a data flow run. For more information, see Managing data flow runs.

Regardless of the selected Read options, if a user chooses to Continue a stopped run then data flow continues to process records from those received after the Stop.

If a user chooses to Restart the run the outcome of the run depends on the selected Read options:

  • If a user selected Only read new records the records sent between Stop and Restart will be ignored. For example, a new data flow run on an existing Kafka topic with this setting will completely disregard existing records.
  • If a user selected Read existing and new records all existing records in a data set will be reprocessed, potentially resulting in duplicates. For example, a new run on an existing Kafka topic with this setting will process all existing records on that topic.

Have a question? Get answers now.

Visit the Support Center to ask questions, engage in discussions, share ideas, and help others.

Did you find this content helpful?

Want to help us improve this content?

We'd prefer it if you saw us at our best.

Pega.com is not optimized for Internet Explorer. For the optimal experience, please use:

Close Deprecation Notice
Contact us