Defining a taxonomy
After you created a model, define the corresponding taxonomy by adding a list of topics to detect in a piece of text. For each topic, you add a list of keywords that define the topic. Based on these keywords, a Text Analyzer rule assigns topics to the analyzed piece of text.
- Optional:
To import a .CSV, .XLS, or
.XLSX file that contains a taxonomy, select Manage > Import.
For more information on taxonomy files, see Requirements and best practices for creating a taxonomy for rule-based classification analysis on Pega Community.
- To create a parent topic, click Add top most.
- Optional:
To create a child topic, select a parent topic and click
Manage > Add.
You can add multiple levels of topics, depending on your use case and classification problem. For example, you can break down the parent category Support into In-store support and Phone support.
- Optional: To detect child topics only when the corresponding parent topic is detected, select Match child topics only if the current topic matches.
-
Select a topic and enter a list of keywords that pertain to that topic.
You can specify keywords of the following types of keywords:
- Should words
- If the Text Analyzer encounters any of the Should words in a piece of text, that text is assigned to the corresponding topic. Create an exhaustive list of should words that pertain to each topic to increase categorization accuracy. For example, a topic Support can include the following keywords: help, assistance, support, aid, guidance, assist, advice, and so on.
- Must words
- You can narrow down your categorization conditions by specifying the words that the content must contain to be assigned to the corresponding topic. For a piece of text to be assigned to a topic, that text must contain all corresponding must words. For example, you can add the words help or assistance that a piece of text must contain to be assigned to the parent category Support.
- And words
- And words are commonly associated with Should words to increase the accuracy and effectiveness with which the text analyzer assigns categories. Use And words to distinguish between similar categories. For example, you can use words such as premises, store, and office as specific to In-store support and phone, and call as specific to Phone support, while both categories share the same set of Should words.
- Not words
- Specify the words that prevent a Text Analyzer from assigning a piece of text to the corresponding topic. For example, enter phone or call as the words that prevent a piece of text from being assigned to the In-store support topic.
- Optional:
To test the taxonomy, select
Actions > Test.
Pega recommends that you always test your taxonomy on a number of text samples to determine whether it accurately assigns topics. Depending on the results, you might refine your taxonomy, for example, by increasing the number of Should words to accommodate for additional use cases, adding Not words to help differentiate between similar categories, and so on.
- Optional: To export the taxonomy as an .XLSX file, select Actions > Export.
-
To save the taxonomy, click Save
You can use the taxonomy as part of a machine-learning topic detection model or directly in Text Analyzers to perform keyword-based topic detection.