我正在使用融合的kafka平台。我有一个主题有4个分区和复制因子2.单个zookeeper,三个代理和kafka-rest代理服务器。现在我负载测试系统的siege运行1000个用户的api列表反过来命中kafka生产者。我有我的生产者和消费者使用其余代理(kafka-rest)。我遇到了以下问题:
{ [Error: getaddrinfo EMFILE] code: 'EMFILE', errno: 'EMFILE', syscall: 'getaddrinfo' }
在kafka-rest日志中我可以看到:
[2016-02-23 07:13:51,972] INFO 127.0.0.1 - - [23/Feb/2016:07:13:51 +0000] "POST /topics/endsession HTTP/1.1" 200 120 14 (io.confluent.rest-utils.requests:77)
[2016-02-23 07:13:51,973] INFO 127.0.0.1 - - [23/Feb/2016:07:13:51 +0000] "POST /topics/endsession HTTP/1.1" 200 120 15 (io.confluent.rest-utils.requests:77)
[2016-02-23 07:13:51,974] INFO 127.0.0.1 - - [23/Feb/2016:07:13:51 +0000] "POST /topics/endsession HTTP/1.1" 200 120 12 (io.confluent.rest-utils.requests:77)
[2016-02-23 07:13:51,978] INFO 127.0.0.1 - - [23/Feb/2016:07:13:51 +0000] "POST /topics/endsession HTTP/1.1" 200 120 6 (io.confluent.rest-utils.requests:77)
[2016-02-23 07:13:51,983] INFO 127.0.0.1 - - [23/Feb/2016:07:13:51 +0000] "POST /topics/endsession HTTP/1.1" 200 120 6 (io.confluent.rest-utils.requests:77)
[2016-02-23 07:13:51,984] INFO 127.0.0.1 - - [23/Feb/2016:07:13:51 +0000] "POST /topics/endsession HTTP/1.1" 200 120 4 (io.confluent.rest-utils.requests:77)
[2016-02-23 07:13:51,985] INFO 127.0.0.1 - - [23/Feb/2016:07:13:51 +0000] "POST /topics/endsession HTTP/1.1" 200 120 7 (io.confluent.rest-utils.requests:77)
[2016-02-23 07:13:51,993] INFO 127.0.0.1 - - [23/Feb/2016:07:13:51 +0000] "POST /topics/endsession HTTP/1.1" 200 120 3 (io.confluent.rest-utils.requests:77)
[2016-02-23 07:13:51,994] INFO 127.0.0.1 - - [23/Feb/2016:07:13:51 +0000] "POST /topics/endsession HTTP/1.1" 200 120 4 (io.confluent.rest-utils.requests:77)
[2016-02-23 07:13:51,999] WARN Accept failed for channel java.nio.channels.SocketChannel[closed] (org.eclipse.jetty.io.SelectorManager:714)
java.io.IOException: Too many open files
at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:422)
at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:250)
at org.eclipse.jetty.io.SelectorManager$ManagedSelector.processAccept(SelectorManager.java:706)
at org.eclipse.jetty.io.SelectorManager$ManagedSelector.processKey(SelectorManager.java:648)
at org.eclipse.jetty.io.SelectorManager$ManagedSelector.select(SelectorManager.java:611)
at org.eclipse.jetty.io.SelectorManager$ManagedSelector.run(SelectorManager.java:549)
at org.eclipse.jetty.util.thread.NonBlockingThread.run(NonBlockingThread.java:52)
at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)
at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)
at java.lang.Thread.run(Thread.java:745)
所以我经历了很多与此相关的问题。设置我的ec2机器参数,以便我得不到too many open file error
。但它没有解决。我已将TIME_WAIT
缩短为30秒。 ulimit -n
是80000。
我已经收集了一些统计数据,看起来像kafka rest proxy,它运行在`localhost:8082上,导致连接太多。 我该如何解决这个问题?有时当错误来临然后我停止我的围攻测试但是当TIME_WAIT连接减少时,我再次用1个用户重新启动负载测试我仍然看到同样的问题。在节点js的rest代理包装器中有一些问题?
`
答案 0 :(得分:0)
您需要增加该进程的ulimit。要检查特定进程的ulimit,请运行以下命令: sudo cat / proc /< process_id > / limits
为了通过supervisord增加进程运行的ulimit,你可以在 supervisord.conf中增加 minfds