Pega Platform provides the pxEmailParser model that you can use as a preprocessing model to analyze incoming emails and parse their content into logical components: body, signature, and disclaimer. You can define the components on which you want to perform text analysis and which you want to exclude from analysis.
A typical use case for using the email parser is when you expect that the signature and disclaimer can adversely affect the outcomes of the topic or sentiment analysis. The email parser ensures that the downstream models work on only the relevant portions of the email. You can define which parts of an email to include for analysis by configuring the text prediction that is associated with an email channel.
Email components that the email parser can identify hold specific types of information:
- Contains the main message of an email.
- Holds a legal notice or warning, for example, a copyright or confidentiality disclaimer. Usually, placed after the signature.
- Contains a sign-off message, the sender's name, contact details, and similar information. Usually, placed at the end of an email.
Your application parses emails according to the Preprocessing model settings of the text prediction that is associated with an email channel. You can select the default pxEmailParser or a different model as the preprocessing model with which you want to parse email content. In this section, you can also configure the following features of natural language processing (NLP):
- Define which email content is analyzed (body, attachment, or both).
- Decide which type of analysis (topic, sentiment, entity) you want to perform on each email component (body, signature, disclaimer).
Training and testing
You can train the email parser with sample emails from your domain to increase the accuracy with which the email parser identifies the signature, body, and disclaimer. As you train or troubleshoot problems with the email parser, you can test the model to see if it works as expected.
The pxEmailParser model supports several languages.