Data Set rule form - Completing the Data set tab

Define the keys.

The Database Table section displays the database table name that the class is mapped to.
In the Selectable Keys section, add as many keys as necessary, and map each key to a property.
In the Partitioning key section, select the property used to split the data into as many equal segments as possible across DNodes.
- To ensure a balanced distribution, select a property that is suitable for partitioning. For example, assuming a table containing customer information, country information is a suitable property for partitioning because it contains enough shared distinct values, but email address is not because it typically has as many distinct values as customer entries.
- Another consideration is the correlation between number of segments (the grouped distinct values delivered by the property) and number of DNodes. The ideal distribution is considered to be as many segments as DNodes.

Note: You can create this data set when you have at least one DNode in the cluster.

This data set stages data for fast decisioning. You can use it when you want to access data very quickly by using a particular key.

When you create an instance of this data set, you need to define the keys.

The keys that you specify in a data set define the data records managed in the Cassandra internal storage. Add as many keys as necessary and map each key to a property.
The first property in the list of keys is the partitioning key used to distribute data across different decision nodes. To keep the decision nodes balanced, make sure that you use a partitioning key property with many distinct values.
Changing keys in an existing data set is not supported. You have to create another instance.

HBase

The HBase data set is designed to read and save data from an external Apache HBase storage. This data set can be used as a source and destination in the Data Flow rules instances.

For configuration details, see Configuring HBase data set.

HDFS

The HDFS data set is designed to read and save data from an external Apache Hadoop File System (HDFS). This data set can be used as a source and destination in the Data Flow rules instances. It supports partitioning so you can create distributed runs with data flows. Becasue this data set does not support the Browse by key option, you cannot use it as a joined data set.

For configuration details, see Configuring HDFS data set.

Stream

This type of data set allows you to process continuous data stream of events (records).

Stream tab

The Stream tab contains details about the exposed services (REST and WebSocket). These exposed services handle stream data set as a resource located at http://<HOST>:7003/stream/<DATA_SET_NAME>, for example: http://10.30.27.102:7003/stream/MyEventStream

Settings tab

The Settings tab allows you to set additional options for your stream data set. After saving the rule instance, you cannot change the settings.

Authentication

The REST and WebSockets endpoints are secured by using the Pega 7 Platform common authentication scheme. Each post to the stream requires authenticating with your user name and password. By default the Enable basic authentication check box is selected.

In the Retention period field, you specify how long the data set keeps the records. The default value is 1 day.

In the Log file size field, you specify the size of the log files, between 10 MB and 50 MB. The default value is 10MB.

Visual Business Director

No configuration required. The data set instance is automatically configured with the Visual Business Director server location as defined by the Visual Business Director connection.

Facebook

Create this data set when you want to connect with the Facebook API. Reference the data set from a data flow and use the Free Text Model rule to analyze text-based content of Facebook posts. The Facebook data set allows you to filter out Facebook posts according to the keywords you specify in it.

Creating an instance of Facebook data set

Prerequisites:

Register on a website for Facebook developers and create a Facebook app. The app is necessary to obtain App ID and App secret details to be used with the Facebook data set.

Note: Do not use one instance of the Facebook data set in multiple data flows. Stopping one of the data flows, stops the Facebook data set in other data flows.

In the App explorer, click on Data Model > Data Set.
Right-click Facebook and click Create.
Name the data set.
From the Type drop down, select Facebook.
In the Apply to (class) field, select the class where you want to create the data set.
Click Create and open.
In the Facebook tab, complete the Access details section with the information from your Facebook app:
- App ID
- App secret
- Facebook Page Token
Note: To ensure that the Facebook connectors always fetch new Facebook posts, you must provide a valid Facebook Page Token.
In the Facebook page URL's section, click Add URL and type the name of the Facebook page or pages for which you want to analyze text-based content.
Optional: In the Authors section, click Add author and type a user's name or users' names whose posts you want to ignore.

Note: When specifying numerous keywords and authors, take into consideration the Facebook Graph API limitations. For more information, read documentation about the Graph API.
Click Save.

Twitter

Create this data set when you want to connect with the Twitter API. Reference the data set from a data flow and use the Free Text Model rule to analyze text-based content of tweets. The Twitter data set allows you to filter out tweets according to the keywords you specify in it.

Note: Do not use one instance of the Twitter data set in multiple data flows. Stopping one of the data flows, stops the Twitter data set in other data flows.

Creating an instance of Twitter data set

Prerequisites:

Register a Twitter app for your Twitter account on the Twitter apps portal. This is necessary to obtain OAuth settings (Consumer key, Consumer secret, Access token, Access token secret) to be used with the Twitter data set.
Optional: Obtain Klout score API key from the Klout website. Klout is a widely used metrics that determines the breadth and strength of one's online social influence.

In the App explorer, click on Data Model > Data Set.
Right-click Twitter and click Create.
Name the data set.
From the Type drop down, select Twitter.
In the Apply to (class) field, select the class where you want to create the data set.
Click Create and open.
In the Twitter tab, complete the Access details section with the OAuth settings from the Twitter app that you created:
- Consumer key
- Consumer secret
- Access token
- Access token secret
Optional: Provide Klout score API key.
Optional: In the Keywords section, click Add keyword and type the words that you want to find in the tweets.

In the Keywords section, you can also type Twitter authors (for example @JohnSmith) that you want to find in tweets.
Optional: In the Timeline section, click Add author and type a user's name or users' names whose tweets you want to analyze.

Note: It is recommended to complete the Keywords or Timeline section. If you leave both of them empty, you analyze all the tweets on the platform.
Optional: In the Authors section, click Add author and type a user's name or users' names whose tweets you want to ignore.

Note: When specifying numerous keywords and authors, take into consideration Twitter Rest API limitations. For more information, read documentation about the Twitter's REST APIs.
Click Save.

YouTube

Create this data set when you want to connect with the YouTube Data API. Reference the data set from a data flow and use the Free Text Model rule to analyze metadata of the YouTube videos. The YouTube data set allows you to filter out metadata of the YouTube videos according to the keywords you specify in it.

Note: Do not use one instance of the YouTube data set in multiple data flows. Stopping one of the data flows, stops the YouTube data set in other data flows.

Creating an instance of YouTube data set

Prerequisites:

Obtain Google API key from the Google developers website. This key is necessary to configure the YouTube data set and get access to the YouTube data.

In the App explorer, click on Data Model > Data Set.
Right-click YouTube and click Create.
Name the data set.
From the Type drop down, select Twitter.
In the Apply to (class) field, select the class where you want to create the data set.
Click Create and open.
In the YouTube tab, provide the Google API key.
Optional: Select the Retrieve video URL check box.

If metadata of a particular YouTube video contains the keywords you specify, this option retrieves the URL of this video.
Optional: Select the Retrieve comments check box.

If metadata of a particular YouTube video contains the keywords you specify, this option retrieves all the users' comments about this video.
In the Keywords section, click Add keyword and type the keyword or keywords that you want to find in the video metadata. The metadata containing the keywords undergo text analysis.
Optional: In the Authors section, click Add author and type a user's name or users' names whose video you want to ignore.

Note: When specifying numerous keywords and authors, take into consideration YouTube Data API limitations. For more information, read documentation about the YouTube Data API.
Click Save.

Data Set rule form Completing Data Sets
About New Data Set History

Data Set rule formCompleting Data Sets

Data Set rule form
Completing Data Sets