Sensu补救 - 重新启动失败的监控过程

时间:2017-04-06 10:01:12

标签: sensu

我们正在使用sensu来监控我们安装了sensu客户端的远程服务器上的某些进程。

当sensu获取信息失败时,有没有办法启动受监视进程的重启。我在网上找到了一些关于补救处理程序的信息

http://thesoftjaguar.com/posts/2015/06/14/sensu-remediation/

http://dev.nuclearrooster.com/2013/07/27/remediation-with-sensu/

但这对我来说似乎不起作用,从未触发补救:

还有两个ruby脚本可以帮助解决这个问题,但不确定使用哪个:

https://github.com/sensu-plugins/sensu-plugins-sensu/blob/master/bin/handler-sensu.rb

https://github.com/nstielau/sensu-community-plugins/blob/remediation/handlers/remediation/sensu.rb

4月20日更新:

我们使用sensu企业。与此同时,我设法获得要调用的remediator.rb脚本,但它无法正常工作,因为它无法从客户端读取JSON响应,并且存在以下例外情况:

{"timestamp":"2017-04-20T03:06:41.733000-0700","level":"error","message":"handler output","handler":{"command":"/etc/sensu/plugins/remediator.rb","type":"pipe","timeout":10,"severities":["critical","warning","unknown"],"name":"remediator"},"event":{"id":"f38cd413-575a-46f6-8845-09d713a29815"},"output":["/opt/sensu/embedded/lib/ruby/gems/2.3.0/gems/sensu-plugin-1.4.2/lib/sensu-plugin/utils.rb:54:in `[]': no implicit conversion of String into Integer (TypeError)\n\tfrom /opt/sensu/embedded/lib/ruby/gems/2.3.0/gems/sensu-plugin-1.4.2/lib/sensu-plugin/utils.rb:54:in `block in deep_merge'\n\tfrom /opt/sensu/embedded/lib/ruby/gems/2.3.0/gems/sensu-plugin-1.4.2/lib/sensu-plugin/utils.rb:52:in `each'\n\tfrom /opt/sensu/embedded/lib/ruby/gems/2.3.0/gems/sensu-plugin-1.4.2/lib/sensu-plugin/utils.rb:52:in `deep_merge'\n\tfrom /opt/sensu/embedded/lib/ruby/gems/2.3.0/gems/sensu-plugin-1.4.2/lib/sensu-plugin/utils.rb:22:in `block in settings'\n\tfrom /opt/sensu/embedded/lib/ruby/gems/2.3.0/gems/sensu-plugin-1.4.2/lib/sensu-plugin/utils.rb:22:in `each'\n\tfrom /opt/sensu/embedded/lib/ruby/gems/2.3.0/gems/sensu-plugin-1.4.2/lib/sensu-plugin/utils.rb:22:in `reduce'\n\tfrom /opt/sensu/embedded/lib/ruby/gems/2.3.0/gems/sensu-plugin-1.4.2/lib/sensu-plugin/utils.rb:22:in `settings'\n\tfrom /opt/sensu/embedded/lib/ruby/gems/2.3.0/gems/sensu-plugin-1.4.2/lib/sensu-handler.rb:123:in `api_settings'\n\tfrom /opt/sensu/embedded/lib/ruby/gems/2.3.0/gems/sensu-plugin-1.4.2/lib/sensu-handler.rb:131:in `api_request'\n\tfrom /opt/sensu/embedded/lib/ruby/gems/2.3.0/gems/sensu-plugin-1.4.2/lib/sensu-handler.rb:179:in `stash_exists?'\n\tfrom /opt/sensu/embedded/lib/ruby/gems/2.3.0/gems/sensu-plugin-1.4.2/lib/sensu-handler.rb:191:in `block (2 levels) in filter_silenced'\n\tfrom /opt/sensu/embedded/lib/ruby/2.3.0/timeout.rb:91:in `block in timeout'\n\tfrom /opt/sensu/embedded/lib/ruby/2.3.0/timeout.rb:33:in `block in catch'\n\tfrom /opt/sensu/embedded/lib/ruby/2.3.0/timeout.rb:33:in `catch'\n\tfrom /opt/sensu/embedded/lib/ruby/2.3.0/timeout.rb:33:in `catch'\n\tfrom /opt/sensu/embedded/lib/ruby/2.3.0/timeout.rb:106:in `timeout'\n\tfrom /opt/sensu/embedded/lib/ruby/gems/2.3.0/gems/sensu-plugin-1.4.2/lib/sensu-handler.rb:190:in `block in filter_silenced'\n\tfrom /opt/sensu/embedded/lib/ruby/gems/2.3.0/gems/sensu-plugin-1.4.2/lib/sensu-handler.rb:188:in `each'\n\tfrom /opt/sensu/embedded/lib/ruby/gems/2.3.0/gems/sensu-plugin-1.4.2/lib/sensu-handler.rb:188:in `filter_silenced'\n\tfrom /opt/sensu/embedded/lib/ruby/gems/2.3.0/gems/sensu-plugin-1.4.2/lib/sensu-handler.rb:36:in `filter'\n\tfrom /opt/sensu/embedded/lib/ruby/gems/2.3.0/gems/sensu-plugin-1.4.2/lib/sensu-handler.rb:80:in `block in <class:Handler>'\nwarning: event filtering in sensu-plugin is deprecated, see http:// bit.ly/sensu-plugin\n"]}

我们使用以下脚本作为remediator.rb https://github.com/sensu-plugins/sensu-plugins-sensu/blob/master/bin/handler-sensu.rb

检查执行正常,我们从sensu客户端服务器获得响应,但看起来像remediator.rb无法处理它。

这是配置:

remediator.json
{
  "handlers": {
    "remediator": {
      "command": "/etc/sensu/plugins/remediator.rb",
      "type": "pipe",
      "timeout": 10,
      "severities": ["critical", "warning", "unknown"]
    }
  }
}

为检查目的尽可能简单:

/etc/sensu/conf.d/checks
{
  "checks": {
    "seyren_check": {
      "command": "/opt/sensu/embedded/bin/ruby /etc/sensu/plugins/check-procs.rb -p unexisent_process",
      "interval": 30,
      "subscribers": ["trep"],
      "handlers": ["remediator","default","file"],
      "occurrences": 1,
      "refresh": 10,
      "remediation": {
        "first_remediation": {
          "occurrences": [1, 2],
          "severities": [1]
        },
        "medium_remediation": {
          "occurrences": ["3-10"],
          "severities": [1]
        },
        "heavy_remediation": {
          "occurrences": ["1+"],
          "severities": [2]
        }
      }
    },
    "first_remediation": {
      "command": "touch /etc/sensu/plugins/test_lr",
      "subscribers": ["my.machine.local"],
      "handlers": ["default"],
      "interval": 10,
      "publish": false
    },
    "medium_remediation": {
      "command": "touch /etc/sensu/plugins/test_mr",
      "subscribers": ["my.machine.local"],
      "handlers": ["default"],
      "interval": 10,
      "publish": false
    },
    "heavy_remediation": {
      "command": "touch /etc/sensu/plugins/test_hr",
      "subscribers": ["my.machine.local"],
      "handlers": ["default"],
      "interval": 10,
      "publish": false
    }
  }
}

1 个答案:

答案 0 :(得分:0)

  • 我们更喜欢使用名为Ansible的CM工具。 这是处理程序和基本思想:

 #! /usr/bin/env ruby

require 'sensu-handler'
    require 'json'

    class Ansible < Sensu::Handler
  def handle
    ansible = settings['ansible']['command'] || 'ansible-playbook'
    playbook = settings['ansible']['playbook'] || nil
    extra_vars = JSON.generate(@event)

    unless @event['check']['ansible'].nil?
      playbook = @event['check']['ansible']['playbook'] || playbook
    end

    command = ansible.to_s playbook.to_s
    output = `#{command}`

    if $?.exitstatus > 0
      puts output
      exit 1
    else
      puts "SUCCESS: #{command}"
    end
  end
end
  • 然后你创建了处理程序config:

    {     &#34;处理程序&#34;:{         &#34; ansible&#34;:{             &#34;输入&#34;:&#34; pipe&#34;,             &#34;命令&#34;:&#34; /etc/sensu/handlers/handler-ansible.rb"         }     } }

补救剧本的设置:

cat conf.d/handler/config_ansible.json 
{
    "ansible": {
        "command": "/etc/sensu/scripts/provision",
        "playbook": " --tags checksum"
    }
}
  • 在检查配置中,添加补救处理程序的名称:

    { ...
    

    &#34;处理程序&#34;:[&#34;电子邮件&#34;,&#34; ansible&#34;,&#34; logstash&#34;] .... }

  • 命令在这里:

    ssh root@my_ansible_server.comp.com -o BatchMode = yes -o&#34; StrictHostKeyChecking no&#34; -o ConnectTimeout = 10 ansible-playbook -i /etc/ansible/generic.hosts /etc/ansible/remediation.yaml $ {@}