Data set rule types
Learn about the types of data set rules that you can create in Pega Platform. Your data set configuration depends on the data set type that you select.
Database Management (DBM)
You can create the following data set records to store and manage data in an internal database table or in a backing database, such as Cassandra or Kafka:
- Database Table
- The Database Table data set allows you to query data stored in the relational database internal to Pega Platform or in an external database, such as Cassandra. You can quickly access the data by using a particular key. For more information, see Creating a Database Table data set that uses the relational database and Connecting to an external Cassandra database through a Database Table data set.
- Decision Data Store
The Decision Data Store data set stages data for fast decision management. You can use it to quickly access data by using a particular key. Define the keys when you create a Decision Data Store data set.
- The keys that you specify in a data set define the data records managed in the backing database (for example, a Cassandra database.) Add as many keys as necessary, and map each key to a property.
- The first property in the list of keys is the partitioning key used to distribute data across different decision nodes. To keep the decision nodes balanced, make sure that you use a partitioning key property with many distinct values.
- Changing keys in an existing data set is not supported. You have to create another instance.
- HBase
- The HBase data set reads and saves data from an external Apache HBase storage. You can use this data set as a source and destination in Data Flow rules instances. For configuration details, see Creating HBase data set and HDFS and HBase client and server versions supported by Pega Platform.
File system
You can create the following data set records to read and write data from and into files:
- File
- The File data set is a tool for reading and writing data from and to files. You can
use this data set type for the following use cases:
- To read from a file in the CSV or JSON format that you upload and to store the
content of the file in a compressed form in the
pyFileSourcePreview
clipboard property. You can use this data set as a source in Data Flow rules instances to test data flows and strategies. For configuration details, see Creating a File data set record for embedded files. - To read data from files in a repository or write data from Pega Platform to files in a repository. You can use this data set a source or destination in Data Flow rules instances. For configuration details, see Creating a File data set record for files on repositories.
- To read from a file in the CSV or JSON format that you upload and to store the
content of the file in a compressed form in the
- HDFS
- The HDFS data set reads and saves data from an external Apache Hadoop File System (HDFS). You can use this data set as a source and destination in Data Flow rules instances. It supports partitioning so you can create distributed runs with data flows. Because this data set does not support the Browse by key option, you cannot use it as a joined data set. For configuration details, see Creating HDFS data set and HDFS and HBase client and server versions supported by Pega Platform.
General
- Monte Carlo
- The Monte Carlo data set is a tool for generating any number of random data records for a variety of information types. When you create an instance of this data set, it is filled with varied and realistic-looking data. This data set can be used as a source in Data Flow rules instances. You can use it for testing purposes in the absence of real data. For configuration details, see Creating Monte Carlo data set.
- Visual Business Director
- The Visual Business Director data set stores data that you can view in the Visual Business Director planner to assess the success of your business strategy. To save data records in the Visual Business Director data set, you can, for example, set it as a destination of a data flow. One instance of the Visual Business Director data set called Actuals is always present in the Data-pxStrategyResults class. This data set contains all the Interaction History records. For more information on Interaction History, see the Pega Community article Interaction History data model. For configuration details, see Creating Visual Business Director data set.
Social
Pega Platform no longer supports the data set types in the Social category:
- Facebook data sets
- YouTube data sets
These features are deprecated and will be removed in future Pega Platform versions. Do not create any data sets using the Facebook or YouTube types.
Stream
You can create the following data set records for processing real-time data:
- Kafka
- The Kafka data set is a high-throughput and low-latency platform for handling real-time data feeds that you can use as input for event strategies in Pega Platform. Kafka data sets are characterized by high performance and horizontal scalability in terms of event and message queuing. Kafka data sets can be partitioned to enable load distribution across the Kafka cluster. You can use a data flow that is distributed across multiple partitions of a Kafka data set to process streaming data. For configuration details, see Creating a Kafka configuration instance and Creating a Kafka data set.
- Kinesis
- The Kinesis data set connects to an instance of Amazon Kinesis Data Streams to get data records from it. Kinesis Data Streams capture, process, and store high volume of data in real time. The type of data includes IT infrastructure log data, application logs, social media, market data feeds, and web clickstream data. The data records in a stream are distributed into groups that are called shards. For more information on the Amazon Kinesis Data Streams, see the Amazon Web Services (AWS) documentation. For configuration details, see Creating a Kinesis data set.
- Stream
- The Stream data set processes a continuous data stream of events (records). Use a Pega REST connector rule to populate the Stream data set with external data. The Stream data set also exposes REST and WebSocket endpoint but Pega recommends that you use a Pega REST connector rule instead whenever possible. You can use the default load balancer to test how Data Flow rules that contain Stream data sets are distributed in multinode environments by specifying partitioning keys. For configuration details, see Creating a Stream data set.
Previous topic Data set rules Next topic Data Set rules - Completing the Create, Save As, or Specialization form