Parse transforms
Use a parse transform rule to define a regular expression for text searches and matches, and to create Java code equivalent to the regular expression. Parse transform rules are referenced in parse transform collection rules ( Rule-Parse-TransformCollection rule type), which in turn are referenced by parse infer rules ( Rule-Parse-Infer rule type).
The following table defines fields and controls on the Transform tab of the Parse Transform form.
Field | Description |
---|---|
Regular expression |
Enter a regular expression conforming to the Java syntax accepted by the
java.util.regex
class. You can use the typical syntactical elements, including brackets [ ], predefined character classes (\s for a white space character), boundary matches (\b for a word boundary), quantifiers, and so on.
When you save the Parse Transform form, the Pega Platform checks the syntax of your regular expression. See the Java documentation for the Pattern class for the authoritative definition of the syntax accepted here. You can use the Regular Expression tool to develop and validate the expression. See About the Regular Expression tool. |
Convert |
Optional. Enter Java code to convert and validate a source text string, and optionally to form a
Pega Platform
property type such as
Date
or
DateTime . The Java code executes for each regular expression pattern match in the source text.
Click the Gear icon to start your Java editor or Notepad. Within the Java code, the following read-only variables are available:
Within the Java, set the
|
Output Type | Select the property type of the output value. |
The following table defines fields and controls on the Compile tab of the Parse Transform form.
Field | Description |
---|---|
Only allow UNIX line termination ('\n') |
Only a newline character (line feed) is treated as a line termination. Sets the constant
PATTERN.UNIX_LINES .
|
Dot ('.') matches all characters including line terminators |
A period character usually matches any character including a line terminator. Select to include any line terminator character in the set of characters that match a period.
Sets the constant
|
Start of line ('^') and end of line ('$') directives match line terminators |
By default, the expressions ^ and $ ignore line terminators, and only match the beginning and end of the entire input sequence. Select to cause ^ to match the beginning of input, and also after any line terminator except at the end. Select to cause $ to match immediately before a line terminator or at the end of the input sequence.
Sets the constant
|
Allow embedded comments |
Select to allow white space and comments (starting with # and ending at the end of a line) to appear and be ignored within a pattern match.
Sets the constant
|
Case Insensitive Matching for US-ASCII Characters |
Select to enable case-insensitive matching. This assumes that only characters in the US-ASCII character set are being matched, unless the next check box is also selected.
Sets the constant
|
Case Insensitive Matching for UNICODE Characters |
Select to enable UNICODE-aware case folding.
Sets the constant
|
Canonical Equivalence |
Select to enable canonical equivalence; two characters are considered to match only if their full canonical decompositions match.
Sets the constant
|