如何避免弹性搜索重复文档

时间:2018-02-03 02:03:14

标签: elasticsearch logstash

如何避免弹性搜索重复文档?

elasticsearch index docs count(20,010,253)与日志行数(13,411,790)不匹配。

documentation:

File input plugin. 
File rotation is detected and handled by this input, 
regardless of whether the file is rotated via a rename or a copy operation.

nifi:

real time nifi pipeline copies logs from nifi server to elk server. 
nifi has rolling log files.

记录elk服务器上的行数:

wc -l /mnt/elk/logstash/data/from/nifi/dev/logs/nifi/*.log
13,411,790 total 

elasticsearch index docs count:

curl -XGET 'ip:9200/_cat/indices?v&pretty'
docs.count = 20,010,253 

logstash输入配置文件:

cat /mnt/elk/logstash/input_conf_files/test_4.conf
input {
file {
path => "/mnt/elk/logstash/data/from/nifi/dev/logs/nifi/*.log"
type => "test_4"
sincedb_path => "/mnt/elk/logstash/scripts/sincedb/test_4"
}
}
filter {
if [type] == "test_4" {
grok {
match => {
"message" => "%{DATE:date} %{TIME:time} %{WORD:EventType} %{GREEDYDATA:EventText}"
}
}
}
}
output {
if [type] == "test_4" {
elasticsearch {
hosts => "ip:9200"
index => "test_4"
}
}
else {
stdout {
codec => rubydebug
}
}
}

1 个答案:

答案 0 :(得分:0)

您可以使用指纹过滤器插件:https://www.elastic.co/guide/en/logstash/current/plugins-filters-fingerprint.html

  

这可以是例如用于在插入时创建一致的文档ID   事件进入Elasticsearch,允许Logstash中的事件导致   要更新的现有文件而不是新文件   创建