Data analysis settings
You can change these settings to affect the Data analysis step of the predictive model configuration process that is described in Analyzing data. The settings include the names of the sections that are displayed for the step and the default values for particular options.
Setting | Description |
Label | |
Wide of scheme | Change the label for cases not found in the development sample. |
Missing | Change the label for missing values. |
Residual group | Change the label for the intervals that are so small that their behavior is not a reliable basis for grouping them in another interval. |
Remaining symbols | Change the label for the intervals that are so small that their behavior is not a reliable basis for grouping them in another category. |
Ignored | Change the label for fields that are excluded from subsequent analysis and modeling. |
Binning and grouping settings | |
Number of bins for numeric fields | Set the initial number of bins used to analyze the values of each numeric. |
Number of bins for symbolic fields | Set the initial number of bins used to analyze the symbols of each symbolic field. |
Create equal width intervals | Select this option to create equal width intervals by default. |
Ignore ordering |
This option is for symbolic predictors only, and by default, it is enabled.
Select this option to combine a category with others most similar in behavior. When this option is disabled, the order of the symbolic categories is assumed to have some meaning and only the neighboring categories are grouped. |
Use z-score instead of student's test |
The z-score and student's test methods determine whether the behavior in
different bins is similar. The student's test is the most widely used statistical
method to see if two sets of data differ significantly.
Select this option for compatibility with previous Prediction Studio versions. |
Auto grouping | Select this option to set auto grouping as a default setting. For more information, see Auto grouping option for predictors. |
Granularity | Set the highest acceptable probability that the difference in behavior between two adjacent intervals is spurious. Reducing the granularity reduces the number of intervals. |
Minimum size (% of the sample) | Set the minimum number of sample cases in each interval. Use this setting to ensure that there is sufficient evidence of the behavior of cases in the interval for its behavior to be used in grouping. Intervals with few cases are combined with their nearest neighbor. |
Merge bins below minimum size in one residual bin |
This option is for symbolic predictors only.
Bins below the minimum size are combined into a residual bin on the assumption that there are insufficient cases for their behavior to be a basis for predictor grouping. |
Deselect predictors with performance below | Set the minimum level of predictive power for a field to continue as a predictor. |
Display settings | |
Use scientific notation | Select this option to see values displayed in a scientific notation. |
Real value precision | Set the number of decimal places to display real values. |
Performance difference threshold | Set the maximum value for the Performance difference column in the Data analysis step. When you change a predictor's role and its performance difference value is higher than the threshold, the value is highlighted in red. This setting applies to the samples constructed with a validation set. |
Previous topic Sample construction settings Next topic Predictor grouping settings