我们的群集由HA中的3个磁盘节点组成。所有节点均为4CPUx26Gig RAM。我们使用带有Erlang 17.3的RabbitMQ 3.6.5。 唯一启用的插件是管理UI插件。
问题在于,通常在3小时的时间内,其中一个服务器(通常是队列最多的服务器)会开始逐渐占用内存,直到服务器崩溃。 这种情况每天都会发生,我们认为没有任何理由在日志中发生这种情况。
附件是服务器从21GB内存堆积起来的日志,此时,当查看管理UI中的概述窗格时 - 它显示仅使用2GB。当发生这种情况时,我们通常有~400个连接,有~470个通道,16个交换,54个队列和~300个消费者。 其中一个队列是启用TTL,4个是优先级队列,所有队列都是持久的。
服务重启后,一切都恢复正常。
关于导致它的原因的任何想法/我们应该如何进行调试?排除已知问题的清单?
Status of node 'rabbit@scraped-node-name' ...
[{pid,399},
{running_applications,
[{rabbitmq_management,"RabbitMQ Management Console","3.6.5"},
{rabbitmq_web_dispatch,"RabbitMQ Web Dispatcher","3.6.5"},
{webmachine,"webmachine","1.10.3"},
{mochiweb,"MochiMedia Web Server","2.13.1"},
{rabbitmq_management_agent,"RabbitMQ Management Agent","3.6.5"},
{rabbit,"RabbitMQ","3.6.5"},
{os_mon,"CPO CXC 138 46","2.3"},
{amqp_client,"RabbitMQ AMQP Client","3.6.5"},
{rabbit_common,[],"3.6.5"},
{mnesia,"MNESIA CXC 138 12","4.12.3"},
{ssl,"Erlang/OTP SSL application","5.3.6"},
{public_key,"Public key infrastructure","0.22.1"},
{crypto,"CRYPTO","3.4.1"},
{inets,"INETS CXC 138 49","5.10.3"},
{compiler,"ERTS CXC 138 10","5.0.2"},
{xmerl,"XML parser","1.3.7"},
{syntax_tools,"Syntax tools","1.6.16"},
{asn1,"The Erlang ASN1 compiler version 3.0.2","3.0.2"},
{ranch,"Socket acceptor pool for TCP protocols.","1.2.1"},
{sasl,"SASL CXC 138 11","2.4.1"},
{stdlib,"ERTS CXC 138 10","2.2"},
{kernel,"ERTS CXC 138 10","3.0.3"}]},
{os,{unix,linux}},
{erlang_version,
"Erlang/OTP 17 [erts-6.2] [source] [64-bit] [smp:4:4] [async-threads:64] [kernel-poll:true]\n"},
{memory,
[{total,2049513496},
{connection_readers,10231416},
{connection_writers,3215768},
{connection_channels,35753016},
{connection_other,14065960},
{queue_procs,430585272},
{queue_slave_procs,34912},
{plugins,525312},
{other_proc,33015816},
{mnesia,333080},
{mgmt_db,33680},
{msg_index,38121640},
{other_ets,6595504},
{binary,1432921304},
{code,27606184},
{atom,992409},
{other_system,15482223}]},
{alarms,[]},
{listeners,[{clustering,25672,"::"},{amqp,5672,"::"}]},
{vm_memory_high_watermark,0.4},
{vm_memory_limit,10983032422},
{disk_free_limit,50000000},
{disk_free,148079415296},
{file_descriptors,
[{total_limit,32668},
{total_used,440},
{sockets_limit,29399},
{sockets_used,418}]},
{processes,[{limit,1048576},{used,4683}]},
{run_queue,0},
{uptime,117601},
{kernel,{net_ticktime,60}}]