In the
Sample construction
step, split the data into the set that is used to train the model and the set that is used to test the model's accuracy.
-
Specify the split between the trainig and testing samples by performing one of the
following actions:
- To assign only the records whose Type field in the file
that you uploaded is set to Test to the testing sample, select
the User-defined sampling based on 'Type' column check box. Use
this option if you have specific sentences to be tested with every model generation for
accuracy.
- To manually specify the percentage of records that are randomly assigned to the
training sample, select the Uniform sampling check box.
-
Correct any issues with the training and testing sample that are displayed in the
Warnings section.
The example issues that can be found include the following items:
- Improperly formatted columns or missing values.
- The categories from the taxonomy that do not have a match in the training and
testing sample.
- The categories from the training and testing sample that do not have a match in the
taxonomy.
It is recommended that you correct any missing values, file formatting,
inconsistencies between the taxonomy and the training and testing sample, and any other
issues to increase the quality of the model.
-
Click Next.