Constructing a sample
A sample is a subset of historical data that you can extract when you apply a selection or sampling method to the data source. A sample construction helps to construct development, validation, and test data sets for analysis and modeling.
-
Select the weight field if present.
Typically, a weight field is available when you sample the data before using it in the Analytics Center portal. If you do not specify the field, each case counts as one.
-
In the Select the fields to sample grid, set the field type and
define the fields that you want to include in the sample.
You can select the Not used type if you do not want to use a particular field.
- Optional: In the User defined field, type a new name for the field.
- Optional: In the Description field, type a description.
-
Select a sampling method.
-
Uniform sampling - Samples a simple proportion of cases. It fills the sample table with a random selection of records from the source. The probability of selection is set to achieve the specified percentage or number of cases.
-
Stratified sampling- Samples a different proportion of each value for the selected field (stratum) that represents the behavior to be predicted. It fills the sample table with random selections of each class.
When you select a stratum field, you can set the ratio of population cases to source records and the percentage of records that you want to sample.
Note: Population is a group of cases with the known behavior which is consistent with the group of cases whose behavior you want to predict. You use the population to extract data samples for modeling and validation.
-
-
In the Hold-out sets section, define the sample percentage that
you want to use for development, validation, and testing.
You can select Setting percentages for each set and divide cases among the sets or select User defined field and divide cases that are available for the field.
- Optional:
Select a field from the data source to assign the records with the same value to one
hold-out set.
For example, you can place family members from the same household into one hold-out set. Family members might have similar profiles that can cause overfitting validation of data if they are not in one hold-out set.Note: The type of hold-out set is selected at random.
- Click Next.