Modifying Apache Ruta scripts to extract custom structured entities
After you create a Decision Data rule for entity extraction, customize the existing Apache Ruta script to adjust it to your business needs.
- On the Data tab of the Decision Data rule, click Open rule to access Apache Ruta script.
- Modify the existing script based on the following example:
PACKAGE uima.ruta.example;
DECLARE VarA;
DECLARE VarB;
DECLARE VarC;
DECLARE VarD;
NUM{REGEXP("(^[0-9]{4})") -> MARK(VarA)}
ANY?
NUM{REGEXP("(^[0-9]{4})") -> MARK(VarB)}
ANY?
NUM{REGEXP("(^[0-9]{4})") -> MARK(VarC)}
ANY?
NUM{REGEXP("([0-9]{4})")-> MARK(VarD),MARK(EntityType,1,7), UNMARK(VarA), UNMARK(VarB), UNMARK(VarC), UNMARK(VarD)};
Key points from the preceding code example:
DECLARE VarA;
declares an entity to annotate. In this use case, four strings of numbers that are separated by a delimiter character are needed; therefore four declare statements are included.NUM{REGEXP("([0-9]{4})") -> MARK(VarB)}
detects a single character between 0 and 9, repeated four times. When a match is found, the entity is marked. Note that the caret character (^) in the first regular expression asserts that the entity is marked only when its position is at the beginning of the string.ANY?
detects whether the entity is separated by any delimiting character, for example, a hyphen (-) or a semicolon (;).MARK(EntityType,1,7)
merges all annotations (VarA
,ANY?
,VarB
,ANY?
,VarC
,ANY?
,VarD
) into a single entity. For an entity to be detected, matches must be found for all enumerated regular expressions.UNMARK(VarD)
unmarks the matched annotation to prevent an overlap with the entity that resulted from the merged annotations. For more information about regular expressions and basic token hierarchy in Apache Ruta scripts, see Apache UIMA Ruta Guide and Reference.
- Click Save.
- Test whether the script that you entered produces the expected results:
- On the Data tab of the Decision Data rule, click Test.
- In the Test window, enter or paste your sample text, and then click Test. If your custom script is correct, the detected entity is displayed in the Entity extraction section at the bottom of the Test window.
Previous topic Creating Decision Data rules that contain scripts for entity extraction Next topic Providing feedback to text analytics models