Skip to main content


         This documentation site is for previous versions. Visit our new documentation site for current releases.      
 

File descriptor limits

Updated on July 5, 2022

This content applies only to On-premises and Client-managed cloud environments

Stream related issues can be caused by the file descriptor limits, which you can prevent by considering setting the right limit and identifying issues in advance.

Set up a right limit

Stream service uses file descriptors for log segments and open connections. Allow at least 100000 file descriptors for the broker processes as a starting point. To determine whether this limit is enough, perform the following calculation to determine the required file descriptors: number of file descriptors = number required for log segments + number of open connections.

To know the number of file descriptors required for log segments use one of the following formulas:

  • Number of file descriptors required for log segments = (number_of_partitions) x (partition_size / segment_size)
    Note: The default number of partitions is 6 or 20 for one topic depending on your version. The default segment size (segment.bytes) is 1073741824 and the partition size depends on the volume of messages.
  • Number of file descriptors required for log segments = (number_of_partitions) x (retention period / segment roll up period)
    Note: The default number of partitions is 6 or 20 for one topic depending on your version. The default retention period is 216000000 and the default segment roll up period is 604800000.
  • Minimum number of file descriptors required for log segments = number_of_partitions x 3
    Note:

    Calculate the number of file descriptors required for connections per node using the following formulas:

    Total connections for producers = Number of producers = topics * 2

    Total connections for consumers = Number of consumers = topics * #ofthreads

    Two connections per node are used to query metadata, offsets, etc. The open connections depend on whether a producer/consumer is idle or active, as connections might be closed when idle.
  • Gzip requires less bandwidth and disk space, but this algorithm might not saturate your network while the maximum throughput is reached.
  • Snappy is much faster than gzip, but the compression ratio is low, which means that throughput might be limited when the maximum network capacity is reached.
  • LZ4 maximizes the performance.

Review the following table and diagram with throughput and bandwidth usage per codec:

Note: Kafka uses around 150 file descriptors to load libraries.

Monitor the usage of file descriptors to identify the issue in advance

  • To know the maximum limit in the system, run cat /proc/sys/fs/file-max
  • To know the maxium limit for the Kafka process, run cat /proc//limits
To monitor file descriptors in use (output), use one of the following commands:
  • lsof | grep | wc –l
    Note: This number should remain under the current limit of Kafka process (max file locks).
  • $ cat /proc/sys/fs/file-nr
    Note: The number of used descriptors should remain under the current limit of the environment.
Note: Sometimes file descriptors corresponding to deleted segments are not reclaimed immediately by operating system. To check if it is the case, run following command lsof | grep | grep deleted | wc –l Restart the node to release file descriptors related to deleted segments only if you receive a large output. The commands are specific to Linux, use equivalent commands in other operating systems.

Recover from the number of file descriptors reaching the limit

If file descriptors in use are close to the limit:

  • Restart the node to clear up deleted segments which are potentially not cleared.
  • Increase the file descriptor limit.
If file descriptors in use are constantly increasing, see Configuring the Stream service.

Use commands to gather file descriptor details

Useful commands

CommandOutput
cat /proc/sys/fs/file-maxmaximum file descriptor limit
cat /proc/<kafka process ID>/limitslimits against a process
cat /proc/sys/fs/file-nrfile descriptors in use
lsof | grep | grep TCP | wc –ltotal connections for a Kafka process
lsof | grep | grep jars | wc –ltotal jars for a Kafka process
: lsof | grep | grep PYFTSINCREMENTALINDEXER-3 | wc –lnumber of threads for a given topic

Have a question? Get answers now.

Visit the Support Center to ask questions, engage in discussions, share ideas, and help others.

Did you find this content helpful?

Want to help us improve this content?

We'd prefer it if you saw us at our best.

Pega.com is not optimized for Internet Explorer. For the optimal experience, please use:

Close Deprecation Notice
Contact us