Best practices for providing feedback for text extraction models
For text extraction models, you can provide feedback on named entity recognition for regular text extraction models. You can also provide feedback for the email parser rule that distinguishes between various email components.
Named entity recognition
Consider the following classification use case:
- Text to analyze – John works for uPlusTelco in New York. He works in UX design building user-friendly UIs. He was born in New Jersey.
- Taxonomy topics – Customer Service, Warranty, Phone Camera, Screen Brightness
- Entities detected after the classification model was run for the first time – uPlusTelco (Person), UX (Organization), New York (Location), New Jersey (Location)
Analysis
By analyzing the detected entities, the following conclusions can be made:
- The model did not detect John as a Person (false negative).
- The model falsely detected UX as an Organization (false positive).
- The model falsely detected uPlusTelco and a Person, instead of as an Organization (false positive).
- The model falsely detected uPlusTelco and a Person, instead of as an Organization (false positive).
- The model correctly detected New York and New Jersey as a Location (true positive).
- The model correctly did not assign an entity type to UIs (true negative).
Based on these observations, you can provide the following feedback to the text extraction model:
- False positives – State that UX is not an Organization and uPlusTelco is not a Person but an Organization.
- False negatives – State that John is a Person.
For example:
<START:PERSON> John <END> works for <START:ORGANIZATION> uPlusTelco <END> in <START:LOCATION> New York <END>. He works in UX design building user-friendly UIs. He was born in <START:LOCATION> New Jersey <END>
where:
- John and uPlusTelco are annotated with corrected entities.
- UX is marked as not being an Organization because it is not annotated.
- All correctly detected entities (New York and New Jersey) were also annotated in the feedback. Otherwise, the model considers removing annotations as negative feedback.
Consider the following points when providing feedback to entity models:
- Always provide the entire document as feedback.
- You must annotate the changes as well as the correct entities. Models recognize as feedback only the sentences whose annotations differ from those that were provided by the model. Therefore, the model trains only on relevant sentences, which decreases the training time.
- For irrelevant entities, remove the
<START:EntityType><END>
tags.
Email parser
You can also provide feedback on the Email Parser entity model. This model divides emails into such meaningful parts as the body, disclaimer, greeting, and signature. By distinguishing between various email parts, you can select only the ones that are relevant to your analysis.
Consider the following document:
Hi Team,
Great job with the project!
Best regards,
The Management
Disclaimer: This is proprietary corporate communication.
After the analysis, the email parser produced the following output:
Hi Team,
Great job with the project!
Best regards,
The Management
Disclaimer: This is proprietary corporate communication.
Analysis
In this output, the signature also involves the disclaimer. You must provide feedback to the models to split the signature from the disclaimer, for example:
Hi Team,
Great job with the project!
Best regards,
The Management
Disclaimer: This is proprietary corporate communication.
Previous topic Best practices for providing feedback for text categorization models Next topic Definition class of text analytics Decision Data rules