Parsing emails
Pega Platform provides the pxEmailParser model that you can use as a preprocessing model to analyze incoming emails and parse their content into logical components: body, signature, and disclaimer. You can define the components on which you want to perform text analysis and which you want to exclude from analysis.
Purpose
A typical use case for using the email parser is when you expect that the signature and disclaimer can adversely affect the outcomes of the topic or sentiment analysis. The email parser ensures that the downstream models work on only the relevant portions of the email. You can define which parts of an email to include for analysis by configuring the text analyzer that is associated with an email channel.
Email components
Email components that the email parser can identify hold specific types of information:
- Body
- Contains the main message of an email.
- Disclaimer
- Holds a legal notice or warning, for example, a copyright or confidentiality disclaimer. Usually, placed after the signature.
- Signature
- Contains a sign-off message, the sender's name, contact details, and similar information. Usually, placed at the end of an email.
Configuration
Your application parses emails according to the settings of the text analyzer that is associated with an email channel. In the text analyzer configuration, you can select the default pxEmailParser or a different model as the preprocessing model with which you want to parse email content. You can also decide which type of analysis (topic, sentiment, entity) you want to perform on each email component (body, signature, disclaimer).
Training and testing
You can train the email parser with sample emails from your domain to increase the accuracy with which the email parser identifies the signature, body, and disclaimer. As you train or troubleshoot problems with the email parser, you can test the model to see if it works as expected.
Supported languages
The pxEmailParser model supports several languages.
Previous topic Downloading information about text analytics models Next topic Training an email parser