Skip to main content


         This documentation site is for previous versions. Visit our new documentation site for current releases.      
 

Modifying Apache Ruta scripts to extract custom structured entities

Updated on July 5, 2022

After you create a Decision Data rule for entity extraction, customize the existing Apache Ruta script to adjust it to your business needs.

  1. On the Data tab of the Decision Data rule, click Open rule to access Apache Ruta script.
  2. Modify the existing script based on the following example:
    PACKAGE uima.ruta.example;DECLARE VarA;DECLARE VarB;DECLARE VarC;DECLARE VarD;NUM{REGEXP("(^[0-9]{4})") -> MARK(VarA)}ANY?NUM{REGEXP("(^[0-9]{4})") -> MARK(VarB)}ANY?NUM{REGEXP("(^[0-9]{4})") -> MARK(VarC)}ANY?NUM{REGEXP("([0-9]{4})")-> MARK(VarD),MARK(EntityType,1,7), UNMARK(VarA), UNMARK(VarB), UNMARK(VarC), UNMARK(VarD)};

    Key points from the preceding code example:

    • DECLARE VarA; declares an entity to annotate. In this use case, four strings of numbers that are separated by a delimiter character are needed; therefore four declare statements are included.
    • NUM{REGEXP("([0-9]{4})") -> MARK(VarB)} detects a single character between 0 and 9, repeated four times. When a match is found, the entity is marked. Note that the caret character (^) in the first regular expression asserts that the entity is marked only when its position is at the beginning of the string.
    • ANY? detects whether the entity is separated by any delimiting character, for example, a hyphen (-) or a semicolon (;).
    • MARK(EntityType,1,7) merges all annotations (VarA, ANY?, VarB, ANY?, VarC, ANY?, VarD) into a single entity. For an entity to be detected, matches must be found for all enumerated regular expressions.
    • UNMARK(VarD) unmarks the matched annotation to prevent an overlap with the entity that resulted from the merged annotations. For more information about regular expressions and basic token hierarchy in Apache Ruta scripts, see Apache UIMA Ruta Guide and Reference.
  3. Click Save.
  4. Test whether the script that you entered produces the expected results:
    1. On the Data tab of the Decision Data rule, click Test.
    2. In the Test window, enter or paste your sample text, and then click Test. If your custom script is correct, the detected entity is displayed in the Entity extraction section at the bottom of the Test window.
    Testing entity extraction
    Example of an entity extraction test with correct results.
Result: In this tutorial, you created a Decision Data rule from an existing one. You also edited the attached Apache Ruta script to extract entities of a specific type to satisfy your business need of finding account numbers in the analyzed text.
    • Previous topic Creating Decision Data rules that contain scripts for entity extraction
    • Next topic Providing feedback to text analytics models

    Have a question? Get answers now.

    Visit the Support Center to ask questions, engage in discussions, share ideas, and help others.

    Did you find this content helpful?

    Want to help us improve this content?

    We'd prefer it if you saw us at our best.

    Pega.com is not optimized for Internet Explorer. For the optimal experience, please use:

    Close Deprecation Notice
    Contact us