使用Logstash从CSV文件导入弹性搜索文档后,我的文档的ID值设置为长字母数字字符串。如何将每个文档ID设置为数值?
这基本上是我的logstash配置的样子:
input {
file {
path => "/path/to/movies.csv"
start_position => "beginning"
sincedb_path => "/dev/null"
}
}
filter {
csv {
columns => ["title","director","year","country"]
separator => ","
}
mutate {
convert => {
"year" => "integer"
}
}
}
output {
elasticsearch {
hosts => ["localhost:9200"]
index => "movie"
document_type => "movie"
}
stdout {}
}
答案 0 :(得分:1)
第一个也是最简单的选择是在CSV中添加新列ID
,并将该字段用作文档ID。
另一种选择是使用ruby
过滤器,为您的活动添加动态ID。此解决方案的缺点是,如果您的CSV更改并重新运行管道,则每个文档可能无法获得相同的ID。另一个缺点是您需要只使用一个工作程序(即-w 1
)运行管道,因为id_seq
变量不能在工作管道之间共享。
filter {
csv {
columns => ["title","director","year","country"]
separator => ","
}
mutate {
convert => {
"year" => "integer"
}
}
# create ID
ruby {
"init" => "id_seq = 0"
"code" => "
event.set('id', id_seq)
id_seq += 1
"
}
}
output {
elasticsearch {
hosts => ["localhost:9200"]
index => "movie"
document_type => "movie"
document_id => "%{id}"
}
stdout {}
}