Skip to main content


         This documentation site is for previous versions. Visit our new documentation site for current releases.      
 

Best practices for providing feedback for text extraction models

Updated on July 5, 2022

For text extraction models, you can provide feedback on named entity recognition for regular text extraction models. You can also provide feedback for the email parser rule that distinguishes between various email components.

Named entity recognition

Consider the following classification use case:

  • Text to analyze – John works for uPlusTelco in New York. He works in UX design building user-friendly UIs. He was born in New Jersey.
  • Taxonomy topics – Customer Service, Warranty, Phone Camera, Screen Brightness
  • Entities detected after the classification model was run for the first time – uPlusTelco (Person), UX (Organization), New York (Location), New Jersey (Location)

Analysis

By analyzing the detected entities, the following conclusions can be made:

  • The model did not detect John as a Person (false negative).
  • The model falsely detected UX as an Organization (false positive).
  • The model falsely detected uPlusTelco and a Person, instead of as an Organization (false positive).
  • The model falsely detected uPlusTelco and a Person, instead of as an Organization (false positive).
  • The model correctly detected New York and New Jersey as a Location (true positive).
  • The model correctly did not assign an entity type to UIs (true negative).

Based on these observations, you can provide the following feedback to the text extraction model:

  • False positives – State that UX is not an Organization and uPlusTelco is not a Person but an Organization.
  • False negatives – State that John is a Person.

For example:

<START:PERSON> John <END> works for <START:ORGANIZATION> uPlusTelco <END> in <START:LOCATION> New York <END>. He works in UX design building user-friendly UIs. He was born in <START:LOCATION> New Jersey <END>

where:

  • John and uPlusTelco are annotated with corrected entities.
  • UX is marked as not being an Organization because it is not annotated.
  • All correctly detected entities (New York and New Jersey) were also annotated in the feedback. Otherwise, the model considers removing annotations as negative feedback.

Consider the following points when providing feedback to entity models:

  • Always provide the entire document as feedback.
  • You must annotate the changes as well as the correct entities. Models recognize as feedback only the sentences whose annotations differ from those that were provided by the model. Therefore, the model trains only on relevant sentences, which decreases the training time.
  • For irrelevant entities, remove the <START:EntityType><END> tags.

Email parser

You can also provide feedback on the Email Parser entity model. This model divides emails into such meaningful parts as the body, disclaimer, greeting, and signature. By distinguishing between various email parts, you can select only the ones that are relevant to your analysis.

Consider the following document:

Hi Team,Great job with the project!Best regards,The ManagementDisclaimer: This is proprietary corporate communication.

After the analysis, the email parser produced the following output:

<START:Greetings>Hi Team,<END><START:Body>Great job with the project!<END><START:Signature>Best regards,The ManagementDisclaimer: This is proprietary corporate communication.<END>

Analysis

In this output, the signature also involves the disclaimer. You must provide feedback to the models to split the signature from the disclaimer, for example:

<START:Greetings>Hi Team,<END><START:Body>Great job with the project!<END><START:Signature>Best regards,The Management<END><START:Disclaimer>Disclaimer: This is proprietary corporate communication.<END>
Note: You must provide the entire document as feedback, including the updated and the correct annotations, in a single API call. This means that the model considers as feedback only the annotations that are different than those originally applied in the output.
  • Previous topic Best practices for providing feedback for text categorization models
  • Next topic Definition class of text analytics Decision Data rules

Have a question? Get answers now.

Visit the Support Center to ask questions, engage in discussions, share ideas, and help others.

Did you find this content helpful?

Want to help us improve this content?

We'd prefer it if you saw us at our best.

Pega.com is not optimized for Internet Explorer. For the optimal experience, please use:

Close Deprecation Notice
Contact us