需要帮助识别Elasticsearch提取的文档类型(通过Logstash)

时间:2017-05-15 20:55:55

标签: csv elasticsearch logstash

我使用Logstash从https://www.kaggle.com/wcukierski/the-simpsons-by-the-data中提取csv文件并将其保存到Elasticsearch。对于初学者,我使用以下conf来摄取simpsons_characters.csv

input {
  file {
    path => "/Users/xyz/Downloads/the-simpsons-by-the-data/simpsons_characters.csv"
    start_position => beginning
    sincedb_path => "/dev/null"
  }
}

filter {
  csv {
    columns   => ["id", "name", "normalized_name", "gender"]
    separator => ","
  }
}

output {
  stdout {
    codec => rubydebug
  }
  elasticsearch {
    hosts   => "localhost"
    action  => "index"
    index   => "simpsons"
  }
}

然而,当我这样查询时:http://localhost:9200/simpsons/name/Lou 哪里 simpsons = index name = type(我认为......不确定)

我收到以下回复:

{
   "_index": "simpsons",
   "_type": "name",
   "_id": "Lou",
   "found": false
}

所以,问题是,为什么我得不到正确的答案。此外,当您通过csv进行批量提取时,文档的type是什么?

谢谢!

2 个答案:

答案 0 :(得分:2)

The default type in Logstash Elasticsearch output is logs。因此,无论您如何定义ID(无论是从csv - document_id => "%{id}"获取还是让ES定义自己的ID),您都可以将这些文档设为http://localhost:9200/simpsons/logs/THE_ID

如果您不知道ID,只想检查是否存在某些内容:http://localhost:9200/simpsons/logs/_search?pretty

如果您想查看索引的映射,例如找出索引的_typehttp://localhost:9200/simpsons/_mapping?pretty

更改默认_type

  elasticsearch {
    hosts   => "localhost"
    action  => "index"
    index   => "simpsons"
    document_type => "characters"
    document_id => "%{id}"
  }

答案 1 :(得分:1)

此处您尚未在logstash输出中指定id field。在这种情况下,elasticsearch会将随机ID 设置为您的文档,并且您正在搜索包含id=Lou的文档。 添加document_id => "%{id}"可以解决您的问题。

output {
  stdout {
    codec => rubydebug
  }
  elasticsearch {
    hosts   => "localhost"
    action  => "index"
    index   => "simpsons"
    document_id => "%{id}"
  }
}