Constructing a sample
A sample is a subset of historical data that you can extract when you apply a selection or sampling method to the data source. A sample construction helps to construct development, validation, and test data sets for analysis and modeling.
-
In the
Data preparation
step, in the
Sample
construction
workspace, from the
Select the weight field if
present
drop-down list, click an available weight field.
Typically, a weight field is available when you sample the data before using it in the Prediction Studio portal. If you do not specify the field, each case counts as one.
-
In the
Select the fields to sample
grid, specify the fields you
want to include in the sample:
-
In the
Type
column, select a field type from the drop-down
list.
Select the Not used type for fields that you want to exclude from the sample.
- Optional: In the Description column, enter a field definition.
- Optional: In the User defined field, type a new name for a field.
-
In the
Type
column, select a field type from the drop-down
list.
- Select a sampling method:
If Then If you want to sample a simple proportion of cases, select the Uniform sampling option. This method fills the sample table with a random selection of records from the source. The probability of selection is set to achieve the specified percentage or number of cases.
If you want to sample a different proportion of each value for the selected field (stratum) that represents the behavior to be predicted, perform the following actions: - Select the Stratified sampling option.
- From the Stratum field drop-down list, select the field you want to sample.
- In the table with stratum values, in the Ratio column, set the proportion of population cases to source records.
- In the Sample percentage column, enter the percentage of records that you want to sample.
This method fills the sample table with random selections of each class.
-
In the
Hold-out sets
section, define the sample percentage that
you want to use for development, validation, and testing:
- To divide cases among the sets, select the Setting percentages for each set option.
- To divide cases that are available for the field, select the User defined field option.
- Optional: Select a field from the data source to assign the records with the same value to one
hold-out set.
- Confirm the sample construction by clicking Next.
Previous topic Selecting a data source Next topic Defining an outcome