打开文件过多"无限制"系统

时间:2014-03-12 14:50:43

标签: java linux limit netty lsof

我在执行程序时发现了很多打开文件异常。通常,这些以下列形式出现:

org.jboss.netty.channel.ChannelException: Failed to create a selector.

...
Caused by: java.io.IOException: Too many open files

但是,这些并不是唯一的例外。我观察过类似的(由#34;太多的打开文件和#34引起)但是那些太多不那么频繁了。

奇怪的是,我已将屏幕会话的打开文件限制(从我启动我的程序)设置为1M:

root@s11:~/fabiim-cbench# ulimit -a
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 20
file size               (blocks, -f) unlimited
pending signals                 (-i) 16382
max locked memory       (kbytes, -l) 64
max memory size         (kbytes, -m) unlimited
**open files                      (-n) 1000000**
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) unlimited
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

此外,正如lsof -p的输出所观察到的那样,在抛出异常之前,我不再看到1111个打开的文件(套接字,管道,文件)。

问题:有什么问题和/或我怎样才能深入研究这个问题。

额外:我目前正在将Floodlightbft-smart进行整合。简而言之,泛光灯过程是在执行基准程序启动的压力测试时发生过多打开文件异常的过程。该基准测试程序将保持与泛光灯进程的64个tcp连接,这反过来应保持与bft-smart副本至少64 * 3个tcp连接。两个程序都使用netty来管理这些连接。

1 个答案:

答案 0 :(得分:4)

要检查的第一件事 - 您可以从Java进程内部运行ulimit以确保文件限制在内部相同吗?像这样的代码应该有效:

InputStream is = Runtime.getRuntime().exec(new String[] {"bash", "-c", "ulimit -a"}).getInputStream();
int c;
while ((c = is.read()) != -1) {
    System.out.write(c);
}

如果限制仍然显示100万,那么,您需要进行一些硬调试。

如果我必须调试这个,我会考虑以下几点 -

  1. 您的tcp端口号是否用尽?当您遇到此错误时,netstat -an会显示什么?

  2. 使用strace确切地找出哪些系统调用带有什么参数导致抛出此错误。 EMFILE的返回值为24

  3. 由于多种不同的原因,许多不同的系统调用实际上会抛出“太多打开的文件”EMFILE错误:

    $ cd /usr/share/man/man2
    $ zgrep -A 2 EMFILE *
    accept.2.gz:.B EMFILE
    accept.2.gz:The per-process limit of open file descriptors has been reached.
    accept.2.gz:.TP
    accept.2.gz:--
    accept.2.gz:.\" EAGAIN, EBADF, ECONNABORTED, EINTR, EINVAL, EMFILE,
    accept.2.gz:.\" ENFILE, ENOBUFS, ENOMEM, ENOTSOCK, EOPNOTSUPP, EPROTO, EWOULDBLOCK.
    accept.2.gz:.\" In addition, SUSv2 documents EFAULT and ENOSR.
    dup.2.gz:.B EMFILE
    dup.2.gz:The process already has the maximum number of file
    dup.2.gz:descriptors open and tried to open a new one.
    epoll_create.2.gz:.B EMFILE
    epoll_create.2.gz:The per-user limit on the number of epoll instances imposed by
    epoll_create.2.gz:.I /proc/sys/fs/epoll/max_user_instances
    eventfd.2.gz:.B EMFILE
    eventfd.2.gz:The per-process limit on open file descriptors has been reached.
    eventfd.2.gz:.TP
    execve.2.gz:.B EMFILE
    execve.2.gz:The process has the maximum number of files open.
    execve.2.gz:.TP
    execve.2.gz:--
    execve.2.gz:.\" document ETXTBSY, EPERM, EFAULT, ELOOP, EIO, ENFILE, EMFILE, EINVAL,
    execve.2.gz:.\" EISDIR or ELIBBAD error conditions.
    execve.2.gz:.SH NOTES
    fcntl.2.gz:.B EMFILE
    fcntl.2.gz:For
    fcntl.2.gz:.BR F_DUPFD ,
    getrlimit.2.gz:.BR EMFILE .
    getrlimit.2.gz:(Historically, this limit was named
    getrlimit.2.gz:.B RLIMIT_OFILE
    inotify_init.2.gz:.B EMFILE
    inotify_init.2.gz:The user limit on the total number of inotify instances has been reached.
    inotify_init.2.gz:.TP
    mmap.2.gz:.\" SUSv2 documents additional error codes EMFILE and EOVERFLOW.
    mmap.2.gz:.SH AVAILABILITY
    mmap.2.gz:On POSIX systems on which
    mount.2.gz:.B EMFILE
    mount.2.gz:(In case no block device is required:)
    mount.2.gz:Table of dummy devices is full.
    open.2.gz:.B EMFILE
    open.2.gz:The process already has the maximum number of files open.
    open.2.gz:.TP
    pipe.2.gz:.B EMFILE
    pipe.2.gz:Too many file descriptors are in use by the process.
    pipe.2.gz:.TP
    shmop.2.gz:.\" SVr4 documents an additional error condition EMFILE.
    shmop.2.gz:
    shmop.2.gz:In SVID 3 (or perhaps earlier)
    signalfd.2.gz:.B EMFILE
    signalfd.2.gz:The per-process limit of open file descriptors has been reached.
    signalfd.2.gz:.TP
    socket.2.gz:.B EMFILE
    socket.2.gz:Process file table overflow.
    socket.2.gz:.TP
    socketpair.2.gz:.B EMFILE
    socketpair.2.gz:Too many descriptors are in use by this process.
    socketpair.2.gz:.TP
    spu_create.2.gz:.B EMFILE
    spu_create.2.gz:The process has reached its maximum open files limit.
    spu_create.2.gz:.TP
    timerfd_create.2.gz:.B EMFILE
    timerfd_create.2.gz:The per-process limit of open file descriptors has been reached.
    timerfd_create.2.gz:.TP
    truncate.2.gz:.\" error conditions EMFILE, EMULTIHP, ENFILE, ENOLINK.  SVr4 documents for
    truncate.2.gz:.\" .BR ftruncate ()
    truncate.2.gz:.\" an additional EAGAIN error condition.
    

    如果您手动查看所有这些联机帮助页,您可能会发现一些有趣的内容。例如,我认为有趣的是,epoll_create(NIO频道使用的基础系统调用)将返回EMFILE“太多打开的文件”,如果

      

    每个用户对epoll实例数量的限制             遇到/ proc / sys / fs / epoll / max_user_instances。看到             epoll(7)了解更多细节。

    现在我的系统上实际上并不存在文件名,但是/proc/sys/fs/epoll/proc/sys/fs/inotify中的文件中定义了一些限制,您可能会遇到这些限制,特别是如果您正在运行多个在同一台机器上进行相同的测试。弄清楚这是不是一件苦差事 - 你可以先查看系统日志中的任何消息......

  4. 祝你好运!