Skip to main content

         This documentation site is for previous versions. Visit our new documentation site for current releases.      

Defining the training and testing samples for topic detection

Updated on May 17, 2024

Split the uploaded data into a set for training the model and a set for testing the model accuracy.

The topic detection model teaches itself based on the training data that you provide. Prediction Studio tests the model against the data that you mark for testing.
  1. In the Sample construction wizard step, specify how you want to split the training and testing samples by performing one of the following actions:
    • If you want Prediction Studio to test the model against the records for which you entered Test in the Type column, select User defined sampling. Use this option if you want to ensure accuracy by testing specific sentences against every model that you generate.
    • If you want to randomly assign records for testing, select Uniform sampling, and then manually specify the percentage of records that you want to test against.
  2. If the model creation wizard displays issues in the Warnings section, address the issues before proceeding.
    The issues displayed by the wizard refer to the training and testing sample that you provide. Example issues include:
    • Incorrectly formatted columns or missing values.
    • Categories from the taxonomy do not have a match in the training and testing sample.
    • Categories from the training and testing sample do not have a match in the taxonomy.
  3. Click Next.
What to do next: Define the taxonomy that you want to use for topic detection. For more information, see Reviewing the taxonomy for machine learning topic detection.
  • Previous topic Uploading data for training and testing of the topic detection model
  • Next topic Reviewing the taxonomy for machine learning topic detection

Have a question? Get answers now.

Visit the Support Center to ask questions, engage in discussions, share ideas, and help others.

Did you find this content helpful?

Want to help us improve this content?

We'd prefer it if you saw us at our best. is not optimized for Internet Explorer. For the optimal experience, please use:

Close Deprecation Notice
Contact us