Preparing data

The Data preparation step begins when you connect to a database or upload your data from a data set or a CSV file. The columns in the data source are used as predictors but you can later define their roles. For more information, see Defining the predictor role.

The data is necessary to create a statistically relevant sample with customer details that can be further segregated into different dataset types such as development, validation, and testing. The customer data that goes into development sample is used to develop predictive models. Data in the validation and test sample is used to validate and test model accuracy.

The data source contains customer and their previous behavior information. It should contain one record per customer, each record presented in the same structure. Ideally, the data should be present for all fields and customers but in most circumstances some missing data can be tolerated.

Based on your model selection and outcome field categorization, Prediction Studio generates data that you can view in the Graphical view tab and Tabular view tab. For more information, see Defining an outcome.