我们有一个独立的Chef服务器安装(v 12.1.2)。它已经运行好几个月,但最近它已经开始每天几次崩溃。看看日志,它看起来像是" opscode-erchef"服务每天都会崩溃几次。这来自opscode-erchef崩溃日志:
2017-07-28 08:44:26 =ERROR REPORT====
["Could not connect, scheduling reconnect.",{error,{{error,{badmatch,{error,{auth_failure_likely,{econnrefused,{gen_server,call,[<0.2016.0>,connect,infinity]}}}}},[{bunny_util,connect,1,[{file,"src/bunny_util.erl"},{line,191}]},{gen_bunny_mon,do_connect,3,[{file,"src/gen_bunny_mon.erl"},{line,192}]},{gen_bunny_mon,handle_info,2,[{file,"src/gen_bunny_mon.erl"},{line,134}]},{gen_server,try_dispatch,4,[{file,"gen_server.erl"},{line,593}]},{gen_server,handle_msg,5,[{file,"gen_server.erl"},{line,659}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,237}]}]},{connection_info,{network,{127,0,0,1},5672,{<<"chef">>,<<"de073b36fa2831124fc06a121702610c1ddaf367cc9aad74e9a9ba7381fa355a9fd8aaf4be57745fecfc0f0a1c275aab3190">>},<<"/analytics">>}}}}]
2017-07-28 08:44:26 =ERROR REPORT====
Could not start the network driver: econnrefused
2017-07-28 08:44:26 =ERROR REPORT====
** Generic server <0.2019.0> terminating
** Last message in was connect
** When Server state == {state,<0.2017.0>,{amqp_params,<<"chef">>,<<"de073b36fa2831124fc06a121702610c1ddaf367cc9aad74e9a9ba7381fa355a9fd8aaf4be57745fecfc0f0a1c275aab3190">>,<<"/chef">>,{127,0,0,1},5672,0,0,0,none,[]},undefined,undefined,undefined,undefined,undefined,undefined,<0.2018.0>,false,undefined,{{0,nil},{dict,0,16,16,8,80,48,{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},{{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]}}}},undefined,#Fun<amqp_connection_sup.0.94524864>}
** Reason for termination ==
** {econnrefused,[{amqp_network_connection,do_connect,1,[{file,"src/amqp_network_connection.erl"},{line,337}]},{amqp_network_connection,handle_call,3,[{file,"src/amqp_network_connection.erl"},{line,93}]},{gen_server,try_handle_call,4,[{file,"gen_server.erl"},{line,607}]},{gen_server,handle_msg,5,[{file,"gen_server.erl"},{line,639}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,237}]}]}
2017-07-28 08:44:26 =CRASH REPORT====
crasher:
initial call: amqp_network_connection:init/1
pid: <0.2019.0>
registered_name: []
exception exit: {econnrefused,[{gen_server,terminate,7,[{file,"gen_server.erl"},{line,804}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,237}]}]}
ancestors: [<0.2017.0>,gen_bunny_mon,gen_bunny_sup,<0.1531.0>]
messages: []
links: [<0.2017.0>]
dictionary: []
trap_exit: false
status: running
heap_size: 610
stack_size: 27
reductions: 531
neighbours:
2017-07-28 08:44:26 =SUPERVISOR REPORT====
Supervisor: {<0.2017.0>,amqp_connection_sup}
Context: child_terminated
Reason: econnrefused
Offender: [{pid,<0.2019.0>},{name,connection},{mfa,{amqp_network_connection,start_link,[{amqp_params,<<"chef">>,<<"de073b36fa2831124fc06a121702610c1ddaf367cc9aad74e9a9ba7381fa355a9fd8aaf4be57745fecfc0f0a1c275aab3190">>,<<"/chef">>,{127,0,0,1},5672,0,0,0,none,[]},<0.2018.0>,#Fun<amqp_connection_sup.0.94524864>]}},{restart_type,intrinsic},{shutdown,brutal_kill},{child_type,worker}]
2017-07-28 08:44:26 =SUPERVISOR REPORT====
Supervisor: {<0.2017.0>,amqp_connection_sup}
Context: shutdown
Reason: reached_max_restart_intensity
Offender: [{pid,<0.2019.0>},{name,connection},{mfa,{amqp_network_connection,start_link,[{amqp_params,<<"chef">>,<<"de073b36fa2831124fc06a121702610c1ddaf367cc9aad74e9a9ba7381fa355a9fd8aaf4be57745fecfc0f0a1c275aab3190">>,<<"/chef">>,{127,0,0,1},5672,0,0,0,none,[]},<0.2018.0>,#Fun<amqp_connection_sup.0.94524864>]}},{restart_type,intrinsic},{shutdown,brutal_kill},{child_type,worker}]
Rettarting opscode-erchef followed by opscode-expander service brings it back again.
有谁可以告诉,在什么情况下opscode-erchef服务会崩溃?当发生这种情况时,我不会看到CPU或内存上的任何压力。所以服务器资源似乎不是问题。
谢谢!
答案 0 :(得分:0)
错误与RabbitMQ和工人有关;尝试增加RabbitMQ可用的连接数量或调整超时或增加连接。
rabbitmq['rabbit_mgmt_http_max_count']
rabbitmq-management插件使用的HTTP连接池的最大工作计数。默认值:100。
rabbitmq['rabbit_mgmt_timeout']
rabbitmq-management插件使用的HTTP连接池的超时。默认值:30000。
要了解如何更改可调整的设置和其他设置,请查看here。