File descriptor limits
This content applies only to On-premises and Client-managed cloud environments
Stream related issues can be caused by the file descriptor limits, which you can prevent by considering setting the right limit and identifying issues in advance.
Set up a right limit
Stream service uses file descriptors for log segments and open connections. Allow at least 100000 file descriptors for the broker processes as a starting point. To determine whether this limit is enough, perform the following calculation to determine the required file descriptors: number of file descriptors = number required for log segments + number of open connections.
To know the number of file descriptors required for log segments use one of the following formulas:
- Number of file descriptors required for log segments =
(number_of_partitions) x (partition_size /
segment_size)
Note: The default number of partitions is 6 or 20 for one topic depending on your version. The default segment size (segment.bytes) is 1073741824 and the partition size depends on the volume of messages. - Number of file descriptors required for log segments =
(number_of_partitions) x (retention
period / segment roll up period)
Note: The default number of partitions is 6 or 20 for one topic depending on your version. The default retention period is 216000000 and the default segment roll up period is 604800000. - Minimum number of file descriptors required for log
segments = number_of_partitions x
3
Note: Calculate the number of file descriptors required for connections per node using the following formulas:
Total connections for producers = Number of producers = topics * 2
Total connections for consumers = Number of consumers = topics * #ofthreads
Two connections per node are used to query metadata, offsets, etc. The open connections depend on whether a producer/consumer is idle or active, as connections might be closed when idle.
- Gzip requires less bandwidth and disk space, but this algorithm might not saturate your network while the maximum throughput is reached.
- Snappy is much faster than gzip, but the compression ratio is low, which means that throughput might be limited when the maximum network capacity is reached.
- LZ4 maximizes the performance.
Review the following table and diagram with throughput and bandwidth usage per codec:
Monitor the usage of file descriptors to identify the issue in advance
- To know the maximum limit in the system, run
cat /proc/sys/fs/file-max
- To know the maxium limit for the Kafka process, run
cat /proc//limits
lsof | grep | wc –l
Note: This number should remain under the current limit of Kafka process (max file locks). $ cat /proc/sys/fs/file-nr
Note: The number of used descriptors should remain under the current limit of the environment.
lsof | grep | grep deleted | wc –l
Restart the node to release file descriptors related to deleted
segments only if you receive a large output. The commands are specific to Linux,
use equivalent commands in other operating systems.Recover from the number of file descriptors reaching the limit
If file descriptors in use are close to the limit:
- Restart the node to clear up deleted segments which are potentially not cleared.
- Increase the file descriptor limit.
Use commands to gather file descriptor details
Useful commands
Command | Output |
cat /proc/sys/fs/file-max | maximum file descriptor limit |
cat /proc/<kafka process
ID>/limits | limits against a process |
cat /proc/sys/fs/file-nr | file descriptors in use |
lsof | grep | grep TCP | wc –l | total connections for a Kafka process |
lsof | grep | grep jars | wc –l | total jars for a Kafka process |
: lsof | grep | grep PYFTSINCREMENTALINDEXER-3 | wc
–l | number of threads for a given topic |
Previous topic Best practices for Stream service configuration Next topic Monitoring the Stream service