logstash - > elasticsearch:如何在输出新数据之前删除所有数据

时间:2018-03-23 02:58:21

标签: elasticsearch logstash

logstash每周获取一次多个事件,然后将这些事件转发给elasticsearch,

如何配置logstash让它告诉elasticsearch删除旧事件?

编辑2018-03-28:

输入:

{host:"host1", type:"packages", records: [{name:"pkg1", ver: "1"}, {name: "pkg2", ver: "2"},...]
{host:"host1", type:"mounts", records: [{path:"path1", dev: "dev1"}, {path:"path2", dev: "dev2"},...]
{host:"host1", type:"???", records: [{???}, {???},...]
...
{host:"host2", type:"packages, records: [{name:"pkg1", version: "1"}, {name: "pkg2", ver: "2"},...]
{host:"host2", type: "mounts", records: [{path:"path1", dev: "dev1"}, {path:"path2", dev: "dev2"},...]
{host:"host2", type:"???", records: [{???}, {???},...]

这是每个主机的各种事件。每个事件都包含一系列 无法确定的 架构。

为了能够精确地搜索数组中的字段,我必须将数组拆分成多个elasticsearch文档。

(我知道有一些方法可以不拆分但是能够在数组内搜索。这是另一个故事:Nested Object。在我的情况下,内部对象不是固定的模式,所以我不能提供每个内部预先定义字段)

输出:

{host: "host1", type:"packages", record: {name: "pkg1", ver: "1"}}
{host: "host1", type:"packages", record: {name: "pkg2", ver: "2"}}
{host: "host1", type:"mounts", record: {path: "path1", dev: "dev1"}}
{host: "host1", type:"???", record: {???}
{host: "host1", type:"???", record: {???}
{host: "host1", type:"mounts", record: {path: "path2", dev: "dev2"}}
{host: "host2", type:"packages", record: {name: "pkg1", ver: "1"}}
{host: "host2", type:"packages", record: {name: "pkg2", ver: "2"}}
{host: "host2", type:"mounts", record: {path: "path1", dev: "dev1"}}
{host: "host2", type:"mounts", record: {path: "path2", dev: "dev2"}}
{host: "host2", type:"???", record: {???}
{host: "host2", type:"???", record: {???}
...

logstash.conf:

input { ... }

filter {
    split {
      # split array and save them into new multiple events
      field => "records"
    }
    mutate {
      rename => { "records" => "record" }
    }
}

output {
  elasticsearch {
    hosts => ["ELASTIC_IP:PORT"]
    index => "packages-%{+YYYY.MM.dd}"
  }
}

-

问题是:Elasticsearch将为每种类型的每个主机填充越来越多的旧事件。

所以我想在获取新数据后删除主机的旧数据。

注意某些尝试失败:

因为输出是多个文档,而不是单个文档,有时更多,有时更少,所以它不是简单的更新。它必须是一个全部删除&添加。

我知道有一些方法可以不拆分但能够在数组内搜索。这是另一个故事:Nested Object。在我的例子中,内部对象不是固定的模式,所以我不能事先提供每个内部字段定义

1 个答案:

答案 0 :(得分:0)

好吧,我确认可以通过ruby过滤器来删除旧索引。

input { ... }

filter {
  split {
    # split array and save them into new multiple events
    field => "records"
  }
  mutate {
    rename => { "records" => "record" }
  }

  ruby {
    init => "
       require 'net/http'
       require 'uri'
     "
    code => "
      uri = URI.parse('http://docker.for.mac.localhost:19200/inventory-' + event.get('type') + '@' + event.get('host'))
      http = Net::HTTP.new(uri.host, uri.port)
      req = Net::HTTP::Delete.new(uri.request_uri)
      req.basic_auth 'elastic', 'changeme'
      res = http.request(req)
    "
  }
}

output {
  elasticsearch {
    hosts => ["ELASTIC_IP:PORT"]
    index => "inventory-%{type}@%{host}"
  }
}

重要的是为主机和类型的每个组合指定索引,以便在删除时可以轻松找到。

index => "inventory-%{type}@%{host}"