Skip to main content


         This documentation site is for previous versions. Visit our new documentation site for current releases.      
 

This content has been archived and is no longer being updated.

Links may not function; however, this content may be relevant to outdated versions of the product.

Genetic algorithm settings

Updated on March 11, 2021

In the Genetic algorithm section of a predictive model, you can change default values of selected options for creating these algorithms.

Genetic algorithm tab settings

Add pool
Add a pool (or population) of predictive models under the control of the genetic algorithm. Click the name of the pool to change its settings.
Default settings
Define the settings for the construction and operation of a pool.

Pool details dialog box settings

Pool details
Pool name
Enter a name of the pool.
Pool size
Enter several models in the Pool
Technique
Technique
Select the type of genetic algorithm that is used for developing the pool:
Generational
Each generation creates an entirely new pool of models by selecting the fittest ones from the original pool as parents, and recombining them to produce new offspring.
Steady state
Each generation replaces a certain number of models from the pool. In each generation, the fittest models from the original pool are selected as parents and recombined to produce new offspring. The new offspring replace the worst models in the original pool. This algorithm tends to converge faster than the generational algorithm.
Hill climbing
Each generation uses every model as a parent. After randomly selecting another parent, the offspring are created by recombining the parents. The offspring replace the parents only if they are fitter than the parents. This ensures a monotonically increasing average fitness.
Simulated annealing

This algorithm uses mutation to create offspring. Each generation mutates every model to create new offspring. If the fitness of the offspring is better than their parent, they replace the parent. Otherwise, there is still the probability of acceptance determined by the Boltzmann equation (difference in fitness divided by the current temperature). After each generation, the temperature is decreased by using the specified decrease factor. The simulated annealing algorithm is designed to circumvent premature convergence at early stages.

If the best and average performance in a pool have not improved for several generations, try switching to this technique to produce new models and, after some time, select one of the other genetic algorithm techniques.

Optimize new (sub)models
Select this option to optimize the parameters of each part of the model as it is changed. This ensures the best use of the predictive information is made throughout the model.
Sampling mechanism
Select the sampling mechanism that is used for developing the pool:
Stochastic universal
The relative fitness of a model, when compared to all other models in the pool, determines the probability of its selection as a parent. This is known as the stochastic universal version of the roulette wheel selection. The stochastic universal mechanism produces a selection that is more accurate in reflecting the relative fitness of the models than the steady state mechanism.
Roulette wheel
The relative fitness of a model, when compared to all other models in the pool, determines the probability of its selection as a parent. This is known as the roulette wheel selection because the process is similar to spinning a roulette wheel in which fitter models have more numbers on the wheel relatively to less fit models. There is a greater probability of selecting a highly fit model. The wheel is spun to select each parent.
Tournament
This method randomly picks a certain number of models as contestants for a tournament. The fittest model in this collection wins the tournament and it is selected as parent.
Scaling method
Select the scaling method that is used for developing the pool:
No scaling
The raw fitness values are used to determine the selection probabilities of models. However, this can also lead to premature convergence when some of the models have exceptionally high fitness values. Before using raw fitness values, rescale the fitness values by using an alternative scaling method.
Rank linear

Using this method, the fittest model is given a fitness of s between 1 and 2. The worst model is given a fitness of two minus s. Intermediate models get the fitness value given by the following interpolation formula:

f of i equals s minus two times i minus 1 in brackets times s minus one in brackets over N minus one

where i equals one to N.

Rank exponential
Exponential ranking gives more chance to the worst models at the expense of those above average. The fittest model gets a fitness of 1.0, the second best is given a fitness of s (typically, about 0.99). The third best is assigned s squared and so on. The last one receives s to the power of N minus one.
Linear

Linear scaling adjusts the fitness values of all models in such a way that models with average fitness get a fixed number of expected offspring.

If the minimum yields a positive scaled value:

scaled f equals f times s minus 1 over maximum minus average plus maximum minus s times average over maximum minus average

Otherwise:

scaled f equals f times one over average minus minimum minus minimum over average minus minimum

In both cases, the average always gets a scaled value of 1. In the first case, the maximum is assigned a scaled value of , whereas, in the second case, the minimum is mapped to 0.

Windowing
This scaling method introduces a moving baseline. The worst value observed in the most recent generations is subtracted from the fitness values, where is known as the window size, typically between 2 and 10. This scaling method increases the chance of selecting the worst model, which prevents the pool from prematurely optimizing around the current best model.
Sigma
This scaling method dynamically determines a baseline based on standard deviation. It sets the baseline s, and the standard deviation below the mean, where s is the scaling factor, typically between 2 and 5.
Elite size
Number of the top-performing models in one generation that are carried onto the next generation. Enter 1 to prevent the pool from losing its best model.
Replacement count
Enter the number of models to replace at each generation of the steady state algorithm.
Tournament size
Enter the number of tournament contestants for the tournament sampling.
Scaling parameter
Enter the number for the parameter or parameters that are used in each scaling method for fine-tuning.
Model construction
Use bivariate statistics
Select this option to use the operators and their parameters that are identified as best at modeling the interactions between predictors when you create a bivariate model.
Use predictor groups
Select this option to use one predictor from each of the groups that are identified during predictor grouping and only replace a predictor with another one from the same group. This option prevents the inclusion of duplicate predictors and minimizes the size of the model that is required to incorporate all information. Clear this option to increase model depth and allow more freedom to the genetic algorithm.
Enable intelligent genetics
Enable intelligent genetics to develop non-linear models (where non-linearity is assumed from the outset) that might outperform models that are developed by structural genetics. This strategy initially generates models with a lower performance, and it is a slow and computationally more expensive process. The result is identical size models and, if the relationship between data and behavior is non-linear, these models have greater predictive power.
Enable structural genetics
Structural genetics is the default strategy to develop near-linear models that are at least as powerful as regression models. Non-linear operators are introduced only where they improve performance. Initially, structural genetics generates models with higher performance, and model generation is faster. The result is variable size models with greater data efficiency, which is translated in achieving more power from the same data. The models are easier to understand because they are more linear and robust, and more likely to perform as expected on different data.
Maximum tree depth
Specify the maximum number of levels in the models. For balanced models, the minimum is given by the following formula:
nodes euqlas left bracket two times left bracket number of predictors plus number of constants double right bracket minus one
Crossover mutation
Crossover probability
Specify the probability of crossover occurrence during the creation of the offspring. Crossover is the process of creating models by exchanging branches of parent trees.
Mutation probability
Specify the probability of mutation occurrence on the created offspring. Mutation is the random alteration of a (randomly selected) node in a model.
Branch replacement
Specify the probability of replacing whole branches with randomly created ones during mutation.
Node replacement
Specify the probability of changing only the type of a node in a model.
Argument swapping
Specify the probability of changing the child order (argument order) of a node in a model.
Simulated annealing
Initial temperature
Specify the initial value of the temperature that controls the amount of change to models.
Temperature decrease
Specify the rate at which the temperature decreases with each generation.

Have a question? Get answers now.

Visit the Support Center to ask questions, engage in discussions, share ideas, and help others.

Did you find this content helpful?

Want to help us improve this content?

We'd prefer it if you saw us at our best.

Pega.com is not optimized for Internet Explorer. For the optimal experience, please use:

Close Deprecation Notice
Contact us