Analyzing social media content through data flows
Complete this tutorial to use the Pega Platform™ text mining capabilities to analyze the text data that is posted on Facebook, Twitter, and YouTube through a data flow. By using data flows, you can extract large amounts of text data in real-time, process it with text analyzers, and save that data in a data set for further processing or analysis, for example, in NLP Sample, Pega's natural language processing application.
The NLP Sample application showcases the text analytics capabilities of Pega Platform. This application is available as an archive file that you can download and install. The application includes a portal and a sample text analyzer with text samples, taxonomies, and lexicons that you can explore or extend. For more information, see Exploring text analytics with the NLP Sample application.
Social media platforms contain unstructured data in the form of status messages, posts, comments, and so on. By analyzing this data, you can structure it and derive usable business information to deliver better services to customers and increase your customer base. For example, by tracking and analyzing tweets about your organization or your products, you can discover whether the customer response to the release of one of your products was positive or negative. By quickly detecting, responding to, and addressing your customers' needs and issues you can retain and grow your customer base.
Pega Platform provides a collection of techniques that you can use to process and structure text data from social media platforms:
- Sentiment analysis – Detect and analyze the feelings (attitudes, emotions, and opinions) that characterize a unit of text, for example, to find out whether an online product review was positive or negative. Knowledge about customers' sentiments can be important. Customers are often influenced by opinions of others that they find online when they make buying decisions.
- Classification analysis – Assign one or more classes or categories to a text sample to make it easier to manage and sort. By classifying customer queries into various categories, the relevant information can be accessed more quickly, which increases the speed of customer support response times.
- Entity extraction analysis – Extract named entities from text data and assign the detected entities to predefined categories such as names of organizations, locations, people, quantities, or values. This type of analysis can help you track the activity of your customers and competitors or discover the products or features that customers comment on most often. You can combine entity extraction with sentiment analysis to detect whether the opinions about particular products or services are negative or positive.
- Intent analysis – Identify user intentions within the context of an email, instant message, microblog post, and so on. For example, you can determine whether the person intends to buy or recommend your product.
Configuring text analytics in your application
Perform the following tasks to configure text analytics in your application:
- Creating a Data Set rule for social media.
- Creating a Text Analyzer rule.
- Creating a Data Flow rule for processing text data.
Creating a Data Set rule for social media
One of the most important tasks in establishing the infrastructure for text analytics is connecting your application with the Facebook, Twitter, or YouTube API through Data Set rules. Each social media platform that you can mine for text data has a corresponding data set type in your application. Before you can connect your application with Facebook, Twitter or YouTube, you must register your application to obtain authentication credentials.
Creating a Text Analyzer rule
Use Text Analyzer rules to process the text data that your application sources from a Facebook, Twitter, or YouTube data set. You can use a variety of tools for analyzing and structuring the text data to obtain the business intelligence that is vital to accomplishing your business goals, such as identifying and responding to dissatisfied customers, discovering business trends, and so on.
- In the Explorer panel, click Records >Decision.
- Right-click Text Analyzer, and click Create.
- On the Create form, provide the following information for the new rule:
- Enter the rule label.
- Specify the ruleset, Applies To class, and ruleset version.
- Click Create and open.
- On the Select analysis tab of the Text Analyzer form, configure one or more of the following options:
- Configure sentiment analysis – Define the sentiment lexicons and models to use for detecting opinions expressed within the analyzed text.
- Configure classification analysis – Define the taxonomy (that is, a collection of predefined categories that are associated with specific keywords) to use for detecting the categories that text data can be assigned to.
- Configure entity extraction analysis – Define topics, entity extraction models, and entity extraction rules to extract only the data that pertains to your subjects of interest.
- Configure intent analysis – Select the applicable intent analysis model that can detect various intent types.
Data scientists can create and train custom models for sentiment and classification analysis in the Decision Analytics work area. By using the model creation wizard, you define the type of model and the training algorithm, upload training and testing data, train the model, and review its accuracy. You can use the generated models (as decision data binary files) in text analysis in your application. You can also export the models.
- On the I/O mapping tab of the Text Analyzer form, configure the following parameters:
- Input text: .pyText
- Outcome: .NLPOutcome
- On the Advanced tab of the Text Analyzer form, configure the following settings:
- Configure the language settings – For Twitter only, control how your application detects the text language. For example, you can specify that the language is detected by your application or by the provider of the text data.
- Configure the sentiment settings – Adjust the score range for sentiment detection. For example, by narrowing down the score range of the negative sentiment, you can identify only the most negative feedback that needs to be responded to quickly.
- Configure the spelling checker – Enable the spelling checker to increase the confidence score of the data that you categorize, to make sure that it is categorized accurately.
- Configure classification settings – Define the granularity level for text classification (sentence level or document level). For example, select the sentence granularity level to classify smaller units of text such as comments or tweets.
Creating a Data Flow rule for processing text data
After you create and configure a social media data set that tracks the relevant data, and configure a Text Analyzer rule to process and structure that data, you can create a Data Flow rule to extract and process the text data. Minimally, this data flow must reference a social media data set as the source, a text analyzer to process the extracted text data, and a destination. The destination of the data flow can be an activity that writes the data flow results to a database.
- In the Explorer panel, click Records > Data Model.
- Right-click Data Flow, and click + Create.
- On the Create form, provide the following information for the new rule:
- Enter the rule label.
- Specify the ruleset, Applies To class, and ruleset version. The Applies To class of the Data Set and Data Flow rules must be the same.
- Click Create and open.
- Double-click the Source shape and perform the following actions:
- In the Source properties dialog box, from the list, select
- From the Submit. list, select a social media data set, and click
- Click the connector that radiates from the Source shape, and select Text Analyzer from the list.
- Double-click the Text Analyzer shape and perform the following actions:
- In the Text Analyzer properties dialog box, in the field, select your Text Analyzer rule.
- Click Submit.
- Double-click the Destination shape and perform the following actions:
- From the Destination list, select a destination type, for example, Activity.
- From the list that corresponds to the selected destination type, select a destination. For example, if you select Activity as the destination type, you can select the pxSaveSummaryForReporting activity to write the results to a database.
- Click Submit.
You can enrich your data flow with additional shapes, depending on your business objective. For example, if you analyze Twitter data, you can add a Filter shape between the Source and Text Analyzer shapes with the .pyKloutScore >= 70 condition to analyze only tweets from the most influential members of the audience. - Save the rule.
- In the Data Flows landing page, start a data flow run that references the Data Flow rule that you created to process the text data.
A Data Flow rule that contains a Text Analyzer for classifying call context
For more information, see About Data Flow rules and Data Flows landing page.
In this tutorial, you created a data set to collect text data from social media platforms such as Facebook, Twitter, or YouTube and configured that data set to fetch only the data that is relevant to your business objectives. You created a Text Analyzer rule and customized the analytics processes to apply to the collected text data. Finally, you grouped all the rules that are relevant for your text analytics process in a data flow and arranged them in a processing pattern.
Previous topic Analyzing natural language with text analytics Next topic Analyzing WhatsApp content in NLP Sample in real time