Analyzing the metadata of YouTube videos in Pega 7.1.9

Updated on September 6, 2017

The Pega 7 Platform provides text analytics so that users can analyze text-based content such as news feeds, emails, and postings on social media streams including Facebook, Twitter, and YouTube. Use this tutorial to learn how to configure the Pega 7 Platform to analyze the metadata of YouTube videos.

The social video-sharing platform YouTube provides several tools for community interaction. Users can upload videos and make them available for others to watch. When uploading a video, users can provide metadata that helps index the video. The metadata includes titles, keywords, descriptions, tags, author's name, and categories.

Using the new text analytics capability of the Pega 7 Platform, you can analyze the video metadata for particular keywords. You can retrieve video URLs and comments to get community feedback, such as implicit knowledge about users, videos, and community interests. Such information can provide strategic insights and influence enterprise decisions.

Prerequisites
Creating an instance of the YouTube data set
Creating an instance of the Free Text Model rule
Creating an instance of the Data Flow rule
Analyzing the metadata of YouTube videos

This tutorial takes approximately 30-40 minutes to complete.

Prerequisites

Obtain a Google API key from the Google Developers website. This key is necessary to configure the YouTube data set and get access to the YouTube data.
Add the PEGA-NLP ruleset to your application.
If you use IBM WebSphere Application Server or Oracle WebLogic Server to run the Pega 7 Platform, you need to configure the Signer and SSL Certificate settings. Without this configuration the YouTube data set does not work.

Creating an instance of the YouTube data set

Create and configure a YouTube data set called YouTubeData to establish a connection with the YouTube Data API.

Do not use one instance of the YouTube data set in multiple data flows. If you stop one of the data flows, the YouTube data set in other data flows is also stopped.

Click the Application menu in Designer Studio and switch to your application.
In the App Explorer, click <app_name> >Data Model >Data Set.
Right-click YouTube, and click Create.
Name the data set YouTubeData.
From the Type list, select YouTube.
Specify the context where you want to create the data set:
- In the Apply to (class) field, select the Data-Social-YouTube class.
Click Create and open.
On the YouTube tab, provide the Google API key.
Optional: Select the Retrieve video URL check box.
If the metadata of a particular YouTube video contains the keywords that you specify, this option retrieves the URL of this video.
Optional: Select the Retrieve comments check box.
If the metadata of a particular YouTube video contains the keywords that you specify, this option retrieves all the user comments about the video.
In the Keywords section, click Add keyword and type the keyword or keywords that you want to find in the video metadata. The metadata that contains the keywords undergoes text analysis.
Optional: In the Authors section, click Add author and type the names of one or more users whose videos you want to ignore.
Click Save.

When specifying numerous keywords and authors, take into consideration YouTube Data API limitations. For more information, read the documentation about the YouTube Data API.

Creating an instance of the Free Text Model rule

Create a Free Text Model rule called SampleModel and configure it to analyze sentiment only. For more information see the Free Text Model rule.

In the Records Explorer, click Decision > Free Text Model.
Click Create.
Name the rule SampleModel.
Specify the context where you want to create the rule:
- In the Apply to (class) field, select the Data-Social-YouTube class.
  You do not need to create the SampleModel rule in the same class as the YouTubeData data set, but it needs to be in the Data-Social-YouTube class hierarchy. You can use the top level class or the base class.
Click Create and open.
Enable sentiment analysis:
1. Select the Enable sentiment analysis check box.
2. In the Lexicon field, select pySentimentLexicon.
3. In the Sentiment model field, select pySentimentModels.
Click the I/O Mapping tab.
In the Input text field, set the .pyText property.
In the Outcome field, set the .NLPOutcome property.
Create the property if it does not exist. This must be a single-page property defined on the Data-NLP-Outcome class.
Click Save.

Creating an instance of the Data Flow rule

Create a data flow called NLPProcess to reference the SampleModel rule and to process the metadata of the YouTube videos that are handled by the YouTube data set.

You need to create the NLPProcess data flow in the same class as the YouTubeData data set.

In the Records Explorer, click Data Model > Data Flow.
Click Create.
Name the rule NLPProcess.
Specify the context where you want to create the rule:
- In the Apply to (class) field, select the Data-Social-YouTube class.
Click Create and open.
Double-click the Source shape.
1. In the Source properties dialog box, from the Source list select Data set.
2. From the Data set list, select YouTube and click Submit.
Navigate to the Source shape and click the green add icon.
From the list, select Free Text Model.
Double-click the Free Text Model shape.
1. In the Free Text Model properties dialog box, in the Free Text Model field reference the SampleModel rule.
2. Click Submit.
Navigate to the Free Text Model shape and click the green add icon.
From the list, select Filter.
Double-click the Filter shape.
1. Name the shape Sentiment.
2. In the Filter conditions section, specify the following condition: .NLPOutcome.pyOverallSentiment = "negative"
  The outcome property that you use in the filter, must be the same as the one that you specified in the SampleModel rule.
3. Click Submit.
Click the Destination shape.
1. In the Destination properties dialog box, from the Destination list, select Activity.
2. In the Activity field, reference the following activity: pxSaveSummaryForReporting.
3. Click Submit.
Click Save.