The Preparing data step begins when you connect to a database or upload your data as a CSV file. The columns in the data source are used as predictors, you can later define their roles.
The data is necessary to create a statistically relevant sample with customers' details that can be further segregated into different dataset types such as development, validation, and testing. The customers' data that goes into development sample is used to develop predictive models. Data in the validation and test sample is used to validate and test model's accuracy.
The data source contains customer and their previous behavior information. It should contain one record per customer, each record presented in the same structure. Ideally, the data should be present for all fields and customers but in most circumstances some missing data can be tolerated.
Based on your model selection and outcome field categorization, PAD generates data that you can view in the Graphical view tab and Data view tab.
Previous: Model development with the Predictive analytics process wizard |
Next: Selecting a data source |