我有一个配置尝试通过查询我的群集中的现有ID来减少重复项,然后删除该字段(如果存在)。这样做的原因是,有时可以在多次运行的输入中找到被摄取的事件,因此当使用指纹工作同名的索引时,当小时或日期翻转时,将插入重复项。
Val在这里给了我一个很好的答案:Logstash doc_as_upsert cross index in Elasticsearch to eliminate duplicates但是我注意到集群上的负载增加了。
我正在尝试通过查看前一小时的索引名称而不是所有与名称匹配的索引来减少搜索空间,从而减少负载。
以下是相关的过滤器代码:
filter {
fingerprint {
method => "SHA1"
key => "uniq"
source => ["ID"]
target => "[@metadata][fingerprint]"
}
ruby {
init => "require 'time'"
code => '
begin;
event.set("Yesterday", Time.at(Time.now.to_i - (60 * 60 * 24)).utc.strftime("%Y-%m-%d"))
end'
}
elasticsearch {
hosts => ["elastic04:9204"]
index => "usage-%{Yesterday}"
query => "_id:%{[@metadata][fingerprint]}"
fields => {"_id" => "id_found"}
}
if [id_found] {
drop {}
}
}
问题在于,弹性搜索过滤器插件似乎不会扩展索引名称中的变量。这是错误:
[2016-12-13T11:34:02,659][WARN ][logstash.filters.elasticsearch] Failed to query elasticsearch for previous event {:index=>"usage-%{Yesterday}", :query=>"_id:e6fdc447a1d72b2fcf41d05b72de9df0160014b6", :event=>2016-12-13T19:34:00.933Z elastic04 %{message}, :error=>#<Elasticsearch::Transport::Transport::Errors::NotFound: [404] {"error":{"root_cause":[{"type":"index_not_found_exception","reason":"no such index","resource.type":"index_or_alias","resource.id":"usage-%{Yesterday}","index":"usage-%{Yesterday}"}],"type":"index_not_found_exception","reason":"no such index","resource.type":"index_or_alias","resource.id":"usage-%{Yesterday}","index":"usage-%{Yesterday}"},"status":404}>}