Question

我使用Logstash从https://www.kaggle.com/wcukierski/the-simpsons-by-the-data中提取csv文件并将其保存到Elasticsearch。对于初学者，我使用以下conf来摄取simpsons_characters.csv：

input {
  file {
    path => "/Users/xyz/Downloads/the-simpsons-by-the-data/simpsons_characters.csv"
    start_position => beginning
    sincedb_path => "/dev/null"
  }
}

filter {
  csv {
    columns   => ["id", "name", "normalized_name", "gender"]
    separator => ","
  }
}

output {
  stdout {
    codec => rubydebug
  }
  elasticsearch {
    hosts   => "localhost"
    action  => "index"
    index   => "simpsons"
  }
}

然而，当我这样查询时：http://localhost:9200/simpsons/name/Lou 哪里 simpsons = index name = type（我认为......不确定）

我收到以下回复：

{
   "_index": "simpsons",
   "_type": "name",
   "_id": "Lou",
   "found": false
}

所以，问题是，为什么我得不到正确的答案。此外，当您通过csv进行批量提取时，文档的type是什么？

谢谢！

Answer 1

The default type in Logstash Elasticsearch output is logs。因此，无论您如何定义ID（无论是从csv - document_id => "%{id}"获取还是让ES定义自己的ID），您都可以将这些文档设为http://localhost:9200/simpsons/logs/THE_ID。

如果您不知道ID，只想检查是否存在某些内容：http://localhost:9200/simpsons/logs/_search?pretty。

如果您想查看索引的映射，例如找出索引的_type：http://localhost:9200/simpsons/_mapping?pretty。

更改默认_type：

  elasticsearch {
    hosts   => "localhost"
    action  => "index"
    index   => "simpsons"
    document_type => "characters"
    document_id => "%{id}"
  }

Answer 2

此处您尚未在logstash输出中指定id field。在这种情况下，elasticsearch会将随机ID 设置为您的文档，并且您正在搜索包含id=Lou的文档。添加document_id => "%{id}"可以解决您的问题。

output {
  stdout {
    codec => rubydebug
  }
  elasticsearch {
    hosts   => "localhost"
    action  => "index"
    index   => "simpsons"
    document_id => "%{id}"
  }
}

需要帮助识别Elasticsearch提取的文档类型（通过Logstash）

2 个答案: