修改 看起来第二台服务器偶尔会出现这个错误,这让我几乎可以确定它是配置问题。可能是以下之一:
net.ipv4.tcp_fin_timeout = 2
net.ipv4.tcp_tw_recycle = 1
net.ipv4.tcp_tw_reuse =1
请求的版本信息:Meteor: 1.5.0
OS: Ubuntu 16.04
Provider: AWS EC2
我在一台服务器(一对)上运行的两个进程上间歇地,看似随机地得到以下错误。其他服务器永远不会收到此错误,错误没有引用我编写的任何代码,因此我只能假设其(a)Meteor中的错误或(b),我的服务器配置错误。进程崩溃的服务器还托管了另外两个流星站点,这两个站点偶尔会出现此错误:
Error: write after end
at writeAfterEnd (_stream_writable.js:167:12)
at PassThrough.Writable.write (_stream_writable.js:212:5)
at IncomingMessage.ondata (_stream_readable.js:542:20)
at emitOne (events.js:77:13)
at IncomingMessage.emit (events.js:169:7)
at IncomingMessage.Readable.read (_stream_readable.js:368:10)
at flow (_stream_readable.js:759:26)
at resume_ (_stream_readable.js:739:3)
at nextTickCallbackWith2Args (node.js:511:9)
at process._tickDomainCallback (node.js:466:17)
我已经检查过的事情:
sysctl.conf
,这是故障服务器的内容sysctl.conf
但是,正常运行的服务器具有相同的配置。
fs.file-max = 1000000
fs.nr_open = 1000000
ifs.file-max = 70000
net.nf_conntrack_max = 1048576
net.ipv4.netfilter.ip_conntrack_max = 32768
net.ipv4.tcp_fin_timeout = 2
net.ipv4.tcp_max_orphans = 8192
net.ipv4.ip_local_port_range = 16768 61000
net.ipv4.tcp_max_syn_backlog = 10024
net.ipv4.tcp_max_tw_buckets = 360000
net.core.netdev_max_backlog = 2500
net.ipv4.ip_local_port_range = 1024 65535
net.ipv4.tcp_tw_recycle = 1
net.ipv4.tcp_tw_reuse =1
net.core.somaxconn = 20048
我在server1上有一个NGINX平衡器,它在4个进程之间进行负载平衡(每个服务器2个)。 NGINX错误日志中包含如下行:
2017/08/17 16:15:01 [warn] 1221#1221: *6233472 an upstream response is buffered to a temporary file /var/lib/nginx/proxy/1/46/0000029461 while reading upstream, client: 164.68.80.47, server: server redacted, request: "GET path redacted HTTP/1.1", upstream: "path redacted", host: "host redacted", referrer: "referrer redacted"
在发生错误时,我看到一对这样的行:
2017/08/17 15:07:19 [error] 1222#1222: *6215301 connect() failed (111: Connection refused) while connecting to upstream, client: ip redacted, server: server redacted, request: "GET /admin/sockjs/info?cb=o2ziavvsua HTTP/1.1", upstream: "http://127.0.0.1:8080/admin/sockjs/info?cb=o2ziavvsua", host: "hostname redacted", referrer: "referrer redacted"
2017/08/17 15:07:19 [warn] 1222#1222: *6215301 upstream server temporarily disabled while connecting to upstream, client: ip redacted, server: server redacted, request: "GET /admin/sockjs/info?cb=o2ziavvsua HTTP/1.1", upstream: "http://127.0.0.1:8080/admin/sockjs/info?cb=o2ziavvsua", host: "hostname redacted", referrer: "referrer redacted"
如果它很重要,我使用3节点mongo副本集,其中两个服务器都指向所有3个节点。
我还使用自定义托管版本的kadira(因为它已脱机)。
如果无法阻止错误,无论如何都要阻止他们取消整个过程,有时每个进程连接50-100个用户,因为一个错误似乎过多而启动它们
答案 0 :(得分:0)
这是两天没有崩溃,所以我认为解决方案正在改变:
net.ipv4.tcp_fin_timeout = 2
net.ipv4.tcp_tw_recycle = 1
net.ipv4.tcp_tw_reuse = 1
到
net.ipv4.tcp_fin_timeout = 15
net.ipv4.tcp_tw_recycle = 0
net.ipv4.tcp_tw_reuse = 0
我不知道是哪一个导致问题(可能是超时)。我仍然认为这是一个“错误”,即单个“Write after end”错误会导致整个流星过程崩溃。也许这应该只是记录下来。