Improved Free Text Model rule type
The Free Text Model rule type provides multiple enhancements that extend its functionality and improve its performance in Pega 7.2.1.
These enhancements include:
- Introducing the conditional random field (CRF) as the method for named-entity extraction. The CRF increases the precision of entity recognition and reduces the number of false positives.
- Support for adding custom entities as Rule-based Text Annotation (Ruta) scripts.
- Introducing the playback option to retrieve any number of tweets from a particular time period.
- Introducing the time-based retrieval of posts in Facebook data sets.
Integration of CRF in entity extraction
The Pega 7 Platform now uses the conditional random field (CRF) method for named-entity recognition instead of the OpenNLP method. The CRF method uses a sequence classifier that can perform entity extraction with greater accuracy. It also reduces the number of false positives in entity extraction analysis compared to OpenNLP. The default entity extraction models that support the CRF are pyLocation, pyOrganization, and pyPerson.
Support for custom entities
You can now add custom entities to a Free Text Model rule to create and import entity extraction rules for entities that are part of a specific dictionary (for example, a certain product offering or bundle) or match a certain pattern (for example, a specific type of identity numbers). Adding custom entities eliminates the need to train entity extraction models for such entities, which can be a time-consuming and complex process. You can create each custom entity extraction rule as a Rule-based Text Annotation (Ruta) script and import it into the Pega 7 Platform as part of decision data. The following entity extraction rules are available by default:
- pyDate
- pyEmail
- pySalutation
- pySSN
Entity extraction rules section in the Free Text Model rule form
Detection of custom entities by a free text model rule
The playback option for Twitter data
You can now use Twitter data sets to retrieve tweets by using the new playbackoption. When you select the playback option, you can define the time period for which you want to retrieve Twitter historical data. You can also specify the maximum number of tweets that you want to retrieve. Use the playback option if a streaming data set fails for any reason, whether because of the API disconnecting, server failure, or for any other scenario where tweets cannot be retrieved in real time.
Data recovery options in a Twitter data set
Time-based retrieval of posts from a Facebook data set
You can limit the retrieval of posts by a Facebook data set by using the new search functionality to retrieve Facebook posts that were submitted within a specific period of time instead of all posts that have been submitted since the Facebook page was created. This solution gives you more control over the amount of data that you want to retrieve from a Facebook data set. You can also limit the amount of data that you want to retrieve for failure recovery purposes. For example, if a system outage lasted for one hour, you can configure the data set to retrieve only the posts that were submitted within the last hour.
Time-based retrieval of posts from a Facebook data set