使用Logstash索引ElasticSearch结果

时间:2016-10-11 02:01:37

标签: elasticsearch logstash

我有以下索引:

POST /cars/transactions/_bulk
{ "index": {}}
{ "price" : 10000, "color" : "red", "make" : "honda", "sold" : "2014-10-28" }
{ "index": {}}
{ "price" : 20000, "color" : "red", "make" : "honda", "sold" : "2014-11-05" }
{ "index": {}}
{ "price" : 30000, "color" : "green", "make" : "ford", "sold" : "2014-05-18" }
{ "index": {}}
{ "price" : 15000, "color" : "blue", "make" : "toyota", "sold" : "2014-07-02" }
{ "index": {}}
{ "price" : 12000, "color" : "green", "make" : "toyota", "sold" : "2014-08-19" }
{ "index": {}}
{ "price" : 20000, "color" : "red", "make" : "honda", "sold" : "2014-11-05" }
{ "index": {}}
{ "price" : 80000, "color" : "red", "make" : "bmw", "sold" : "2014-01-01" }
{ "index": {}}
{ "price" : 25000, "color" : "blue", "make" : "ford", "sold" : "2014-02-12" }

我正在执行以下搜索:

GET /cars/transactions/_search
{
    "size" : 0,
    "aggs" : { 
        "popular_colors" : { 
            "terms" : { 
              "field" : "color"
            }
        }
    }
}

我收到的回复如下:

{
  "took": 2,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 8,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "popular_colors": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": "red",
          "doc_count": 4
        },
        {
          "key": "blue",
          "doc_count": 2
        },
        {
          "key": "green",
          "doc_count": 2
        }
      ]
    }
  }
}

我的问题是,如何将该文档重新编入不同的索引?

我试过了:

input {
  elasticsearch {
    hosts => "localhost"
    index => "cars"
    query => '{
    "size" : 0,
    "aggs" : { 
        "popular_colors" : { 
            "terms" : { 
              "field" : "color"
            }
        }
    }
}'
    size => 500
    scroll => "5m"
    docinfo => true
  }
}

但它不起作用,因为插件的 search_type 是扫描而它不支持聚合

我也尝试过:

input {
 file {
  path => "C:\ELK-STACK\logstash-2.3.4\bin\out.json"
  start_position => "beginning"
  codec => json_lines }
  }

out.json的内容是:

{"took":1,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":8,"max_score":1.0,"hits":[{"_index":"cars","_type":"transactions","_id":"AVexGB7_99OIq3MORm7l","_score":1.0,"_source":{ "price" : 10000, "color" : "red", "make" : "honda", "sold" : "2014-10-28" }},{"_index":"cars","_type":"transactions","_id":"AVexGB7_99OIq3MORm7m","_score":1.0,"_source":{ "price" : 20000, "color" : "red", "make" : "honda", "sold" : "2014-11-05" }},{"_index":"cars","_type":"transactions","_id":"AVexGB7_99OIq3MORm7p","_score":1.0,"_source":{ "price" : 12000, "color" : "green", "make" : "toyota", "sold" : "2014-08-19" }},{"_index":"cars","_type":"transactions","_id":"AVexGB7_99OIq3MORm7o","_score":1.0,"_source":{ "price" : 15000, "color" : "blue", "make" : "toyota", "sold" : "2014-07-02" }},{"_index":"cars","_type":"transactions","_id":"AVexGB7_99OIq3MORm7n","_score":1.0,"_source":{ "price" : 30000, "color" : "green", "make" : "ford", "sold" : "2014-05-18" }},{"_index":"cars","_type":"transactions","_id":"AVexGB7_99OIq3MORm7q","_score":1.0,"_source":{ "price" : 20000, "color" : "red", "make" : "honda", "sold" : "2014-11-05" }},{"_index":"cars","_type":"transactions","_id":"AVexGB7_99OIq3MORm7r","_score":1.0,"_source":{ "price" : 80000, "color" : "red", "make" : "bmw", "sold" : "2014-01-01" }},{"_index":"cars","_type":"transactions","_id":"AVexGB7_99OIq3MORm7s","_score":1.0,"_source":{ "price" : 25000, "color" : "blue", "make" : "ford", "sold" : "2014-02-12" }}]}}

后它没有产生任何输出
  

设置:默认管道工人:8

     

管道主要开始

我认为这是因为json文件没有为json插件准备,我需要做一些准备(比如使用Java API),但我想尽可能避免这种情况。

谢谢!

1 个答案:

答案 0 :(得分:0)

正如您所注意到的,elasticsearch输入插件不支持聚合。可以使用http_poller输入插件,以便定期(或每天只发送一次)向Elasticsearch发送聚合查询。然后使用elasticsearch输出,您可以再次将结果聚合发送给ES。

配置基本上是这样的(请注意,聚合查询需要进行URL编码并使用source=... parameter发送到ES)。

input {
  http_poller {
    urls => {
      test1 => 'http://localhost:9200/cars/transactions/_search?source=%7B%22size%22%3A0%2C%22aggs%22%3A%7B%22popular_colors%22%3A%7B%22terms%22%3A%7B%22field%22%3A%22color%22%7D%7D%7D%7D'
   }
   # checking once per day
   interval => 86400
   codec => "json"
  }
}
filter {
}
output {
  elasticsearch {
    hosts => ["localhost:9200"]
    index => "my_aggs"
  }
}