Verify the correctness of the taxonomy of topics that Prediction Studio generated from the training data. If you updated an older
version of a model, the taxonomy might include topics from that version. Clean up your model
by deleting topics that have no training data, and improve the model's predictions by adding
Keywords influence the behavior of a machine learning model, but they
are not exact rules. The Should, Must, and And
words act as positive features for matching a text to a topic, while the
Not words act as negative features. The training and testing data have
the greatest impact on your machine learning model, while keywords have a smaller
You cannot add topics in this step. If you want to add topics, go back to the
Source selection step. For more information, see Uploading data for training and testing of the topic model.
What to do next: Select the algorithms that Prediction Studio uses to build the model, and then start the building process. For more information, see
Training and testing the topic model.
- In the Taxonomy review wizard step, review the taxonomy details,
and then expand the taxonomy to view the topics.
The hierarchy of the taxonomy is used to group topics. Do not add training data or
keywords to grouping topics.
- Review the summary of training and test data for individual topics by selecting the
topics in the list.
- Optional: To add positive or negative features for matching a text to a topic, add keywords to
- Select the topic, and then click the Manage keywords
- In the Keywords section, enter keywords to influence the
Keywords can be words or phrases. You can enter several keywords in each
- Should words
- And words
- Optional: To delete topics that do not contain any training data, select a topic, and then click
Topics without any training data might appear in the taxonomy when you start with a
keyword-based model, and then update it to a machine learning model. If the training data
that you use to train the new model contains a smaller number of topics than the original
keyword-based model, only that number of topics get trained, and the remaining topics are
without training data.
- Click Next.