Nameko / RabbitMQ:OSError:服务器意外关闭连接

时间:2019-12-29 20:44:20

标签: python docker kubernetes rabbitmq nameko

我有两个nameko服务,它们通过RabbitMQ使用RPC进行通信。在本地使用docker-compose可以正常工作。然后,将所有内容部署到DigitalOcean上的Kubernetes / Istio群集,并开始出现以下错误。它在10/20/60分钟内连续重复1次。服务之间的通信可以正常工作(我想在重新构造之前和之后),但是日志混乱了那些不应该发生的意外重新连接。

Helm RabbitMQ configuration file

我试图增加RAM和CPU配置(达到上面的配置文件中的值:512Mb和400m),但是仍然具有相同的行为。

注意:部署后我没有接触任何服务,没有发送任何消息或发出任何请求,并且在60分钟左右的时间内第一次出现此错误。最终我们以后在日志中仍然会出现此错误。

Nameko服务日志:

"Connection to broker lost, trying to re-establish connection...",
"exc_info": "Traceback (most recent call last):
File \"/usr/local/lib/python3.6/site-packages/kombu/mixins.py\", line 175, in run for _ in self.consume(limit=None, **kwargs):
File \"/usr/local/lib/python3.6/site-packages/kombu/mixins.py\", line 197, in consume   conn.drain_events(timeout=safety_interval)
File \"/usr/local/lib/python3.6/site-packages/kombu/connection.py\", line 323, in drain_events
return self.transport.drain_events(self.connection, **kwargs)
File \"/usr/local/lib/python3.6/site-packages/kombu/transport/pyamqp.py\", line 103, in drain_events
return connection.drain_events(**kwargs)
File \"/usr/local/lib/python3.6/site-packages/amqp/connection.py\", line 505, in drain_events
while not self.blocking_read(timeout):
File \"/usr/local/lib/python3.6/site-packages/amqp/connection.py\", line 510, in blocking_read\n    frame = self.transport.read_frame()
File \"/usr/local/lib/python3.6/site-packages/amqp/transport.py\", line 252, in read_frame
frame_header = read(7, True)
File \"/usr/local/lib/python3.6/site-packages/amqp/transport.py\", line 446, in _read
raise IOError('Server unexpectedly closed connection')
OSError: Server unexpectedly closed connection"}
{"name": "kombu.mixins", "asctime": "29/12/2019 20:22:54", "levelname": "INFO", "message": "Connected to amqp://user:**@rabbit-rabbitmq:5672//"}

RabbitMQ日志

2019-12-29 20:22:54.563 [warning] <0.718.0> closing AMQP connection <0.718.0> (127.0.0.1:46504 -> 127.0.0.1:5672, vhost: '/', user: 'user'):
client unexpectedly closed TCP connection
2019-12-29 20:22:54.563 [warning] <0.705.0> closing AMQP connection <0.705.0> (127.0.0.1:46502 -> 127.0.0.1:5672, vhost: '/', user: 'user'):
client unexpectedly closed TCP connection
2019-12-29 20:22:54.681 [info] <0.3424.0> accepting AMQP connection <0.3424.0> (127.0.0.1:43466 -> 127.0.0.1:5672)
2019-12-29 20:22:54.689 [info] <0.3424.0> connection <0.3424.0> (127.0.0.1:43466 -> 127.0.0.1:5672): user 'user' authenticated and granted access to vhost '/'
2019-12-29 20:22:54.690 [info] <0.3431.0> accepting AMQP connection <0.3431.0> (127.0.0.1:43468 -> 127.0.0.1:5672)
2019-12-29 20:22:54.696 [info] <0.3431.0> connection <0.3431.0> (127.0.0.1:43468 -> 127.0.0.1:5672): user 'user' authenticated and granted access to vhost '/'

3 个答案:

答案 0 :(得分:3)

问题与istio代理一起被注入了Rabbitmq pod内的sidecar容器。您需要从Rabbitmq中排除istio代理,然后它才能工作。

答案 1 :(得分:0)

我认为这与this

有关

尝试安装netstat实用程序并运行它,以查看除ESTABLISHED以外是否有太多连接

并尝试在您的设置中添加这些内容:

net.ipv4.tcp_fin_timeout = 30

net.ipv4.tcp_keepalive_time=30
net.ipv4.tcp_keepalive_intvl=10
net.ipv4.tcp_keepalive_probes=4

net.ipv4.tcp_tw_reuse = 1

请参阅this

答案 2 :(得分:0)

您是否尝试过增加连接的心跳?您的连接很可能由于不活动而在较低级别上终止。

还要确保您有足够的资源来运行主机上的所有容器。

我有类似的问题,我不确定以下哪一项为我解决了问题:

  1. 正确的资源管理
  2. 在bash脚本的DockerFile中创建一个入口点,该脚本使用应该在无限循环中执行的代码运行该文件。 (我知道一个解决了内存泄漏的问题-bash脚本使用您的代码执行了文件,您的代码侦听消息,获取消息并执行,退出代码,bash脚本再次加载它。每条消息后,我的工作人员都重新启动(整个工作人员退出并开始了新的工作-不好的主意)。

希望这可以带您到某个地方。