Analyzing WhatsApp content in NLP Sample in real time
You can use the NLP (Natural Language Processing) Sample application to analyze text data from WhatsApp in real-time. WhatsApp is a popular application that is used for chatting, exchanging short messages, and image or video sharing. WhatsApp can be an important source of business intelligence, because this application is often used to express opinions and sentiments about various products and services.
In this tutorial, you create the Pega 7 Platform infrastructure that supports text analysis of WhatsApp content. You create a data class that contains the required rule instances. In that class, you create a stream data set whose exposed services are used to gather WhatsApp records in real-time. These records are transformed into clipboard pages by a data transform and are later processed by a text analyzer that is configured for sentiment, classification, and entity extraction analysis. You also design a processing pattern for your rules by using a data flow. Finally, you trigger the record extraction and analysis by starting a data flow run.
Prerequisites
Before you start this tutorial, do the following tasks:
- Register a user account in the WhatsApp application that represents your company or your services. The text messages that are sent to this account are the source of content for the analysis.
- Install the NLP Sample application. For more information, see Exploring text analytics with the NLP Sample application.
- Create a Java connector that extracts WhatsApp messages as soon as they are posted and forwards them to the stream data set. You can also use a third-party application, for example, SaySimple, to send or receive WhatsApp data.
Analyzing WhatsApp content
- Create the WhatsApp class and associated properties
- Create a stream data set
- Create a Data Transform rule to convert WhatsApp JSON records
- Create a Text Analyzer rule to analyze the text content
- Create and trigger a Data Flow rule that contains the rules for analyzing WhatsApp records
- Test your configuration
Creating the WhatsApp class and associated properties
Create a WhatsApp subclass within the Data-Social parent class of your application. The subclass stores the clipboard properties of WhatsApp records. For detailed information about creating class and properties, see Class rules - Completing the Create, Save As, or Specialization form and Properties - Completing the Create, Save As, or Specialization form.
- In the Explorer panel of Designer Studio, click App.
- Use the Applications search field to navigate to the Data-Social class of your application.
- Under the Data-Social class, create a Data-Social-WhatsApp class.
- Navigate to the Embed class of your application.
- Under the Embed class, create an Embed-Social-WhatsApp class.
- In the Data-Social-WhatsApp class, create the following clipboard properties (property names are case-sensitive):
- content– Page mode property. Its string type is Text. References the Embed-Social-WhatsApp class as its page definition.
- Text– Single Value mode property. Its string type is Text.
- In the Embed-Social-WhatsApp class, create a clipboard property called text (case-sensitive) whose mode is Single Value and the string type is Text.
The clipboard properties that you created correspond to the fields in the incoming WhatsApp JSON records, for example:
{
"content": {
"text": "Your phone support operators are just rude. No doubt about it!"
},
"from": "[email protected]",
"messageId": "b212895dcc91cb1a5f0dbf54bba0789f3d7adb3f",
"name": "John",
"time": 1428482233,
"to": "[email protected]",
"type": "text"
}
Creating a stream data set
Create a data set of type Stream in your application to analyze the WhatsApp messages as soon as they are posted.
- In the Explorer panel of Designer Studio, click App.
- In the Applications search field, enter Data-Social-WhatsApp.
- Right-click Data-Social-WhatsApp, and click + Create > Data Model > Data Set.
- Enter a label for the data set.
- From the Type list, select Stream.
- Click Create and open.
- On the Stream tab of the data set form, review the information about the available services that are used to populate the data set. Each stream data set contains information about the REST and WebSocket services that handle a stream data set as a resource that is located at:
- For REST – http://<HOST>:7003/stream/<DATA_SET_NAME>, for example, http://10.1.1.13:7003/stream/WhatsAppStream
- For WebSocket – ws://<HOST>:7003/stream/<DATA_SET_NAME>, for example, ws://10.1.1.13:7003/stream/WhatsAppStream
Use the provided REST or WebSocket addresses as the destination in the Java or other third-party connector that relays messages from WhatsApp to the Pega 7 Platform. - Optional: On the Settings tab, configure, the following settings:
- Require basic authentication – Enable this setting to require authentication for each incoming record. The records are authenticated with your user name and password.
- Log file size – Specify the size of the log files, between 10 MB and 50 MB. The default value is 10 MB.
- Retention period – Specify how long the data set keeps the records. The default value is 1 day.
After you save the rule, you cannot change any settings.
- Click Save.
For more information, see Data Set rule form - Completing Data Sets.
Creating a Data Transform rule
Create a data transform to convert JSON fields of WhatsApp records into a clipboard page that contains the text property.
- In the Explorer panel of Designer Studio, click App.
- In the Applications search field, enter Data-Social-WhatsApp.
- Right-click Data-Social-WhatsApp, and click + Create > Data Model > Data Transform.
- Enter a label for the data transform.
- Click Create and open.
- On the Definition tab of the data transform form, do the following actions:
- In the Action column, select Set.
- In the Target column, enter .pyText.
- In the Source column, enter primary.content.text.
- Click Add a row.
- In the Action column, select Set.
- In the Target column, enter .pySource.
- In the Source column, enter "WhatsApp".
- Click Save.
WhatsApp data transform
For more information, see Data Transforms.
Creating a Text Analyzer rule
Use Text Analyzer rules to process the WhatsApp text data that your application sources from the stream data set. You can use a variety of tools for analyzing and structuring the text data to obtain the business intelligence that is vital to accomplishing your business goals, such as identifying and responding to dissatisfied customers, discovering business trends, and so on.
- In the Explorer panel of Designer Studio, click Records.
- Expand the Decision list.
- Right-click Text Analyzer, and click + Create.
- On the Create form, provide the following information for the new rule:
- Enter a label for the text analyzer.
- In the Apply to field, press the Down Arrow key, and select Data-Social.
- Specify the ruleset and ruleset version.
- Click Create and open.
- On the Select analysis tab of the Text Analyzer form, configure one or more of the following options:
- Configure sentiment analysis settings – Define the sentiment lexicons and models to use for opinion mining.
- Configure classification analysis settings – Define the taxonomy (a collection of predefined categories that are associated with specific keywords) to use for detecting the categories that text data can be assigned to.
- Configure entity extraction analysis settings – Define topics, entity extraction models, and entity extraction rules to extract only the data that is of interest.
Use the Text Analytics landing page to create and train custom models for sentiment and classification analysis. Using a wizard, you define the type of model and the training algorithm. You also upload training and testing data, train the model, and review its accuracy. You can use the models (as decision data binary files) in text analysis. You can also export the models. - On the I/O mapping tab of the Text Analyzer form, configure the following parameters:
- Input text.pyText
- Outcome.pyOutcome
- On the Advanced tab of the Text Analyzer form, configure the settings for the analysis types that you enabled on the Select analysis tab:
- Configure the language settings – Control how your application detects the language of the text data.
To have the language detected by the source provider (if available), select the Language detected by publisher check box.
- Configure the sentiment settings – Adjust the score range for sentiment detection. For example, by narrowing down the score range of the negative sentiment, you can identify only the most negative feedback that needs to be responded to quickly.
- Configure spelling checker settings – Enable the spelling checker to increase the confidence score of the data that you categorize (that is, the data is categorized more accurately).
- Configure classification settings – Define the granularity level for text classification (sentence level or document level).
- Configure the language settings – Control how your application detects the language of the text data.
- Click Save.
Creating a Data Flow rule
Combine the rules that you created into the processing pattern of a data flow.
- In the Explorer panel of Designer Studio, click App.
- In the Applications search field, enter Data-Social-WhatsApp in the search box.
- Right-click Data-Social-WhatsApp, and click + Create > Data Model > Data Flow.
- Enter a label for the data flow.
- Click Create and open.
- Double-click the Source shape and do the following actions:
- In the Source properties dialog box, from the list, select
- From the Submit. list, select a WhatsApp stream data set and click
- Click the connector that radiates from the Source shape and select Data Transform.
- Double-click the Data Transform shape and do the following actions:
- Enter the shape name.
- In the Data Transform field, press the Down Arrow key and select the WhatsApp data transform.
- Click Submit.
- Click the connector that radiates from the Data Transform shape, and select Text Analyzer from the list.
- Double-click the Text Analyzer shape, and do the following actions:
- In the field, select a Text Analyzer rule.
- Click Submit.
- Double-click the Destination shape and do the following actions:
- From the Destination list, select Activity.
- From the Activity list, select pxSaveSummaryForReporting. By selecting this activity, you can see the analyzed records and the results of text analysis in NLP Sample.
- Click Submit.
- Click Save.
- On the Data Flows landing page, start a real-time data flow run that references the Data Flow rule that you created to process the WhatsApp data.
Data flow pattern for analyzing WhatsApp records
Testing your configuration
You can test your configuration by using third-party software (for example, Google Postman) or scripts to send sample JSON records that mimic WhatsApp messages. If the configuration is correct, the records appear as successfully processed on the Data Flow Run page.
Sending records for testing
To verify the accuracy of the text analyzer, you can also access the NLP Sample application, view the test records, and inspect the analysis results.
Analyzing the test records
Previous topic Analyzing social media content through data flows Next topic Analyzing content from Webhose.io in NLP Sample in real time