Skip to main content


         This documentation site is for previous versions. Visit our new documentation site for current releases.      
 

Estimating the model monitoring payload

Updated on July 5, 2022

Starting in Pega Platform version 8.7, the monitoring of model predictors (input) and output is enabled by default. The system requirements to support this feature are 250 GB of Stream service disk space and 15 GB of analytics repository disk space.

These requirements apply when the monitor percentage is set to the default value of 5%, which defines the amount of all model executions to be monitored.

The estimated sizes of the monitoring payload written to the Stream service and the size of the files written to the analytics repository are based on various parameters, such as the number of common inputs, model executions, and parameterized predictors used during strategy execution.

The estimate is based on the following assumptions:

  • 20% of predictors and inputs have lengthy names (100 characters).
  • 20% of predictors have names of medium length (15 characters).
  • The remaining predictors have short names (10 characters).
  • 40% of our predictors (input) are of the symbolic data type, with each having a text value of 15 characters.
  • 60% of predictors are of the numeric data type, the value for which is a random integer.
  • Each symbolic predictor has around 15 categories.
Note: For the detailed calculations that are the basis of the disk size requirements for the Stream service and the analytics repository, see the model monitoring payload sizing sheet.

Stream service sizing

The Stream service payload is a serialized DecisionResult object, which consists of the following elements:

CommonInputs
A list of all inputs used across a decision and their values.
DecisionResultID
A unique string to identify a decision.
ModelExecutionResults
Contains model-specific information, such as a list of parameterized predictors and their values, outputs of model execution, the state of model execution (success or fail), and model ID.

The estimated size of the payload written to the Stream service is based on the following assumptions:

  • 60% predictors have a size of 40 bytes.
  • 20% predictors have a size of 70 bytes.
  • 20% predictors have a size of 110 bytes.
  • The maximum Stream service disk size is based on the worst case scenario that messages are pumped into the stream, but the messages are not read from the stream.
  • The minimum Stream service disk size is calculated based on the assumption that data is processed from the Stream service queue for monitoring at 20% of the incoming input rate. So, at any given time, 80% of the overall Stream service disk size is required to accommodate all the data.

The size of an individual payload written to the Stream service depends on the number of model executions, the number of common inputs, and the number of parameterized predictors. The required overall size of the Stream service is a function of individual payload sizes, the number of decision results generated, and the sampling percentage used.

Analytics repository sizing

The estimated sizes of the files written to the analytics repository are based on the following assumptions:

  • 60% of predictors are numeric. The size of each numeric input is 2,500 bytes.
  • 40% of predictors are symbolic. Each symbolic predictor has around 16 categories, and each predictor name and value is around 60 characters long. The size of each symbolic input in the summary file is 1,800 bytes.
  • The size of a distribution summary is 1,024 bytes. Another 100 bytes are added to the size of the ModelExecutionResult object to account for model IDs.
  • Records are processed at 20% of the incoming input rate. This assumption has significant ramifications with regard to the number of records that are processed to generate a single file, and therefore, to the total number of files written to the repository and the overall repository size.

The size of the individual files is primarily a function of the number of model executions, the number of parameterized predictors, and the number of common inputs. The overall file size is dependent on the total number of decision results, the sampling percentage, and the rate at which the system can process the incoming decision results.

Database sizing

Model input and output monitoring does not typically impact the database size in standard installations.

  • Previous topic Configuring the monitoring of model input and output
  • Next topic Modifying Prediction Studio notification settings

Have a question? Get answers now.

Visit the Support Center to ask questions, engage in discussions, share ideas, and help others.

Did you find this content helpful?

Want to help us improve this content?

We'd prefer it if you saw us at our best.

Pega.com is not optimized for Internet Explorer. For the optimal experience, please use:

Close Deprecation Notice
Contact us