Running out of memory when using NLP
If you have high memory consumption or exceed memory usage when using text analytics in Pega Platform, use the following best practices to improve your system performance.
Cause
The system might run out of memory because of issues related to the following factors:
- Text size.
- Heap size of the machine on which you run your application.
- Inefficient regular expressions in Ruta scripts.
To ensure continuous text analysis and avoid out of memory issues, follow these recommendations for the natural language processing (NLP) engine:
Solution
- Check whether the analyzed text is of the recommended size for optimal
processing.A large text in NLP terminology has the following attributes:
- Without sentiment analysis enabled:
- 25,000 characters (max)
- 5,000 words (max)
- With sentiment analysis enabled:
- 5,000 characters (max)
- 1,000 words (max)
- Without sentiment analysis enabled:
- Check whether the heap size is within the recommended range of 16-24 GB.
- Optimize your Ruta scripts to avoid storing rule element matches and
rule matches and remove unnecessary annotations:
- Mark the entity types as
UNMARK
. - Minimize the use of other annotations when you do not need
them, for example,
BREAK
andSPACE
.
- Mark the entity types as
Previous topic Language is not detected Next topic Training data for NLP models is missing