Skip to main content


         This documentation site is for previous versions. Visit our new documentation site for current releases.      
 

Creating entity extraction rules for text analytics

Updated on July 5, 2022

You can use the default decision data that contains entity extraction rules in Pega Platform to create custom rules for extracting entities from text.

With the natural language processing capabilities of Pega Platform, you can extract structured data from unstructured text. Structured data is any entity in the text that has a regular and predictable form, for example, email addresses, account numbers, time, monetary amounts, and so on. By extracting structured information from sources such as emails or text messages, you can, for example, react to a customer's message with a complaint about missing luggage by automatically creating a case that maps the detected entities to properties such as customer ID, flight number, or airport code.

To extract structured entities from text, Pega Platform integrates scripts that are based on the Apache UIMA Ruta annotation language. Pega Platform provides several entity extraction rules that you can use as part of the text analyzer rules in your application, for example, pyAccountNumber, pyCaseID, pyRelationship, and so on.

Tutorial

The following example use case explains how to detect unintentionally disclosed account numbers in tweets so that the numbers can be replaced with X characters before they are persisted. The example assumes that each account number consists of four 4-digit numbers that can be delimited by any character, for example, 1234-4567-8901-2345.

Prerequisites

Before completing this tutorial, make sure that you understand the following concepts:

  • The components and functionality of the Pega Platform text analytics feature. For more information, see Analyzing natural language with text analytics.
  • Basic Java regular expressions.
  • Development of Apache UIMA Ruta-based applications.

To create entity extraction rules for text analytics, perform the following procedures:

  1. Creating Decision Data rules that contain scripts for entity extraction

    Create a Decision Data rule that you can later modify and adjust to your business needs.

  2. Modifying Apache Ruta scripts to extract custom structured entities

    After you create a Decision Data rule for entity extraction, customize the existing Apache Ruta script to adjust it to your business needs.

  • Previous topic Machine-learning models for text analytics
  • Next topic Creating Decision Data rules that contain scripts for entity extraction

Have a question? Get answers now.

Visit the Support Center to ask questions, engage in discussions, share ideas, and help others.

Did you find this content helpful?

Want to help us improve this content?

We'd prefer it if you saw us at our best.

Pega.com is not optimized for Internet Explorer. For the optimal experience, please use:

Close Deprecation Notice
Contact us