Skip to main content


         This documentation site is for previous versions. Visit our new documentation site for current releases.      
 

Preparing data for text extraction

Updated on May 17, 2024

In the Source selection step of the text extraction model creation wizard, select the extraction type and provide the data for training and testing of your text extraction model.

  1. In the Extraction type section select a recognizer type:
    • To detect word-level entities, such as person or location, select Default entity recogniser.
    • To detect paragraph-level entities, such as email disclaimer, select Paragraph entity recogniser.
  2. Optional: To view the template for testing and training data, click Download template.
    An example training data record is: Hi, this is <START:name> Bart <END>, where:
    • <START:name> – Marks the start and type of the entity. In the preceding example, the model will detect the string Bart as name.
    • <END> – Marks the end of the entity.
  3. To select and upload a CSV, XLS, or XLS file that contains training and testing data for your text extraction model, click Choose file.
    After you select a valid file, you can preview the types of identified entities and the size of training and testing data. Depending on your business needs, you can exclude entity types from training data. Additionally, you can view errors, for example, missing <START> or <END> tags.
  4. If your file contains errors, perform any of the following actions:
    • Exclude errors from the model by selecting the Exclude below error records and build model check box.
    • Correct errors in the file and repeat step 3.
  5. Click Next.
  • Previous topic Building machine learning text extraction models
  • Next topic Defining the training set and training the text extraction model

Have a question? Get answers now.

Visit the Support Center to ask questions, engage in discussions, share ideas, and help others.

Did you find this content helpful?

Want to help us improve this content?

We'd prefer it if you saw us at our best.

Pega.com is not optimized for Internet Explorer. For the optimal experience, please use:

Close Deprecation Notice
Contact us