厨师服务器例程崩溃

时间:2017-07-28 14:50:09

标签: chef

我们有一个独立的Chef服务器安装(v 12.1.2)。它已经运行好几个月,但最近它已经开始每天几次崩溃。看看日志,它看起来像是" opscode-erchef"服务每天都会崩溃几次。这来自opscode-erchef崩溃日志:

    2017-07-28 08:44:26 =ERROR REPORT====
["Could not connect, scheduling reconnect.",{error,{{error,{badmatch,{error,{auth_failure_likely,{econnrefused,{gen_server,call,[<0.2016.0>,connect,infinity]}}}}},[{bunny_util,connect,1,[{file,"src/bunny_util.erl"},{line,191}]},{gen_bunny_mon,do_connect,3,[{file,"src/gen_bunny_mon.erl"},{line,192}]},{gen_bunny_mon,handle_info,2,[{file,"src/gen_bunny_mon.erl"},{line,134}]},{gen_server,try_dispatch,4,[{file,"gen_server.erl"},{line,593}]},{gen_server,handle_msg,5,[{file,"gen_server.erl"},{line,659}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,237}]}]},{connection_info,{network,{127,0,0,1},5672,{<<"chef">>,<<"de073b36fa2831124fc06a121702610c1ddaf367cc9aad74e9a9ba7381fa355a9fd8aaf4be57745fecfc0f0a1c275aab3190">>},<<"/analytics">>}}}}]
2017-07-28 08:44:26 =ERROR REPORT====
Could not start the network driver: econnrefused
2017-07-28 08:44:26 =ERROR REPORT====
** Generic server <0.2019.0> terminating
** Last message in was connect
** When Server state == {state,<0.2017.0>,{amqp_params,<<"chef">>,<<"de073b36fa2831124fc06a121702610c1ddaf367cc9aad74e9a9ba7381fa355a9fd8aaf4be57745fecfc0f0a1c275aab3190">>,<<"/chef">>,{127,0,0,1},5672,0,0,0,none,[]},undefined,undefined,undefined,undefined,undefined,undefined,<0.2018.0>,false,undefined,{{0,nil},{dict,0,16,16,8,80,48,{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},{{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]}}}},undefined,#Fun<amqp_connection_sup.0.94524864>}
** Reason for termination ==
** {econnrefused,[{amqp_network_connection,do_connect,1,[{file,"src/amqp_network_connection.erl"},{line,337}]},{amqp_network_connection,handle_call,3,[{file,"src/amqp_network_connection.erl"},{line,93}]},{gen_server,try_handle_call,4,[{file,"gen_server.erl"},{line,607}]},{gen_server,handle_msg,5,[{file,"gen_server.erl"},{line,639}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,237}]}]}
2017-07-28 08:44:26 =CRASH REPORT====
  crasher:
    initial call: amqp_network_connection:init/1
    pid: <0.2019.0>
    registered_name: []
    exception exit: {econnrefused,[{gen_server,terminate,7,[{file,"gen_server.erl"},{line,804}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,237}]}]}
    ancestors: [<0.2017.0>,gen_bunny_mon,gen_bunny_sup,<0.1531.0>]
    messages: []
    links: [<0.2017.0>]
    dictionary: []
    trap_exit: false
    status: running
    heap_size: 610
    stack_size: 27
    reductions: 531
  neighbours:
2017-07-28 08:44:26 =SUPERVISOR REPORT====
     Supervisor: {<0.2017.0>,amqp_connection_sup}
     Context:    child_terminated
     Reason:     econnrefused
     Offender:   [{pid,<0.2019.0>},{name,connection},{mfa,{amqp_network_connection,start_link,[{amqp_params,<<"chef">>,<<"de073b36fa2831124fc06a121702610c1ddaf367cc9aad74e9a9ba7381fa355a9fd8aaf4be57745fecfc0f0a1c275aab3190">>,<<"/chef">>,{127,0,0,1},5672,0,0,0,none,[]},<0.2018.0>,#Fun<amqp_connection_sup.0.94524864>]}},{restart_type,intrinsic},{shutdown,brutal_kill},{child_type,worker}]

2017-07-28 08:44:26 =SUPERVISOR REPORT====
     Supervisor: {<0.2017.0>,amqp_connection_sup}
     Context:    shutdown
     Reason:     reached_max_restart_intensity
     Offender:   [{pid,<0.2019.0>},{name,connection},{mfa,{amqp_network_connection,start_link,[{amqp_params,<<"chef">>,<<"de073b36fa2831124fc06a121702610c1ddaf367cc9aad74e9a9ba7381fa355a9fd8aaf4be57745fecfc0f0a1c275aab3190">>,<<"/chef">>,{127,0,0,1},5672,0,0,0,none,[]},<0.2018.0>,#Fun<amqp_connection_sup.0.94524864>]}},{restart_type,intrinsic},{shutdown,brutal_kill},{child_type,worker}]

Rettarting opscode-erchef followed by opscode-expander service brings it back again. 

有谁可以告诉,在什么情况下opscode-erchef服务会崩溃?当发生这种情况时,我不会看到CPU或内存上的任何压力。所以服务器资源似乎不是问题。

谢谢!

1 个答案:

答案 0 :(得分:0)

错误与RabbitMQ和工人有关;尝试增加RabbitMQ可用的连接数量或调整超时或增加连接。

rabbitmq['rabbit_mgmt_http_max_count']

  

rabbitmq-management插件使用的HTTP连接池的最大工作计数。默认值:100。

rabbitmq['rabbit_mgmt_timeout']

  

rabbitmq-management插件使用的HTTP连接池的超时。默认值:30000。

要了解如何更改可调整的设置和其他设置,请查看here