Creating entity extraction rules for text analytics
You can use the default decision data that contains entity extraction rules in Pega Platform to create custom rules for extracting entities from text.
With the natural language processing capabilities of Pega Platform, you can extract structured data from unstructured text. Structured data is any entity in the text that has a regular and predictable form, for example, email addresses, account numbers, time, monetary amounts, and so on. By extracting structured information from sources such as emails or text messages, you can, for example, react to a customer's message with a complaint about missing luggage by automatically creating a case that maps the detected entities to properties such as customer ID, flight number, or airport code.
To extract structured entities from text, Pega Platform integrates scripts that are based on the Apache UIMA Ruta annotation language. Pega Platform provides several entity extraction rules that you can use as part of the text analyzer rules in your application, for example, pyAccountNumber, pyCaseID, pyRelationship, and so on.
Tutorial
The following example use case explains how to detect unintentionally disclosed account numbers in tweets so that the numbers can be replaced with X characters before they are persisted. The example assumes that each account number consists of four 4-digit numbers that can be delimited by any character, for example, 1234-4567-8901-2345.
Prerequisites
Before completing this tutorial, make sure that you understand the following concepts:
- The components and functionality of the Pega Platform text analytics feature. For more information, see Analyzing natural language with text analytics.
- Basic Java regular expressions.
- Development of Apache UIMA Ruta-based applications.
To create entity extraction rules for text analytics, perform the following procedures:
Previous topic Machine-learning models for text analytics Next topic Creating Decision Data rules that contain scripts for entity extraction