我使用Logstash从https://www.kaggle.com/wcukierski/the-simpsons-by-the-data中提取csv文件并将其保存到Elasticsearch。对于初学者,我使用以下conf来摄取simpsons_characters.csv
:
input {
file {
path => "/Users/xyz/Downloads/the-simpsons-by-the-data/simpsons_characters.csv"
start_position => beginning
sincedb_path => "/dev/null"
}
}
filter {
csv {
columns => ["id", "name", "normalized_name", "gender"]
separator => ","
}
}
output {
stdout {
codec => rubydebug
}
elasticsearch {
hosts => "localhost"
action => "index"
index => "simpsons"
}
}
然而,当我这样查询时:http://localhost:9200/simpsons/name/Lou
哪里
simpsons = index
name = type
(我认为......不确定)
我收到以下回复:
{
"_index": "simpsons",
"_type": "name",
"_id": "Lou",
"found": false
}
所以,问题是,为什么我得不到正确的答案。此外,当您通过csv进行批量提取时,文档的type
是什么?
谢谢!
答案 0 :(得分:2)
The default type
in Logstash Elasticsearch output is logs
。因此,无论您如何定义ID(无论是从csv - document_id => "%{id}"
获取还是让ES定义自己的ID),您都可以将这些文档设为http://localhost:9200/simpsons/logs/THE_ID
。
如果您不知道ID,只想检查是否存在某些内容:http://localhost:9200/simpsons/logs/_search?pretty
。
如果您想查看索引的映射,例如找出索引的_type
:http://localhost:9200/simpsons/_mapping?pretty
。
更改默认_type
:
elasticsearch {
hosts => "localhost"
action => "index"
index => "simpsons"
document_type => "characters"
document_id => "%{id}"
}
答案 1 :(得分:1)
此处您尚未在logstash输出中指定id field
。在这种情况下,elasticsearch会将随机ID 设置为您的文档,并且您正在搜索包含id=Lou
的文档。
添加document_id => "%{id}"
可以解决您的问题。
output {
stdout {
codec => rubydebug
}
elasticsearch {
hosts => "localhost"
action => "index"
index => "simpsons"
document_id => "%{id}"
}
}