如何避免弹性搜索重复文档?
elasticsearch index docs count(20,010,253)与日志行数(13,411,790)不匹配。
File input plugin.
File rotation is detected and handled by this input,
regardless of whether the file is rotated via a rename or a copy operation.
nifi:
real time nifi pipeline copies logs from nifi server to elk server.
nifi has rolling log files.
记录elk服务器上的行数:
wc -l /mnt/elk/logstash/data/from/nifi/dev/logs/nifi/*.log
13,411,790 total
elasticsearch index docs count:
curl -XGET 'ip:9200/_cat/indices?v&pretty'
docs.count = 20,010,253
logstash输入配置文件:
cat /mnt/elk/logstash/input_conf_files/test_4.conf
input {
file {
path => "/mnt/elk/logstash/data/from/nifi/dev/logs/nifi/*.log"
type => "test_4"
sincedb_path => "/mnt/elk/logstash/scripts/sincedb/test_4"
}
}
filter {
if [type] == "test_4" {
grok {
match => {
"message" => "%{DATE:date} %{TIME:time} %{WORD:EventType} %{GREEDYDATA:EventText}"
}
}
}
}
output {
if [type] == "test_4" {
elasticsearch {
hosts => "ip:9200"
index => "test_4"
}
}
else {
stdout {
codec => rubydebug
}
}
}
答案 0 :(得分:0)
您可以使用指纹过滤器插件:https://www.elastic.co/guide/en/logstash/current/plugins-filters-fingerprint.html
这可以是例如用于在插入时创建一致的文档ID 事件进入Elasticsearch,允许Logstash中的事件导致 要更新的现有文件而不是新文件 创建