Defining the training and testing samples for topic detection

In the Sample construction step, split the data into the set that is used to train the model and the set that is used to test the model's accuracy.

  1. Specify the split between the trainig and testing samples by performing one of the following actions:
    • To assign only the records whose Type field in the file that you uploaded is set to Test to the testing sample, select the User-defined sampling based on 'Type' column check box. Use this option if you have specific sentences to be tested with every model generation for accuracy.
    • To manually specify the percentage of records that are randomly assigned to the training sample, select the Uniform sampling check box.
  2. Correct any issues with the training and testing sample that are displayed in the Warnings section.
    The example issues that can be found include the following items:
    • Improperly formatted columns or missing values.
    • The categories from the taxonomy that do not have a match in the training and testing sample.
    • The categories from the training and testing sample that do not have a match in the taxonomy.

    It is recommended that you correct any missing values, file formatting, inconsistencies between the taxonomy and the training and testing sample, and any other issues to increase the quality of the model.

  3. Click Next.