我的目标是从ElasticSearch索引中的MySQL表导入数据。 MySQL表有大约250万条记录,但是一段时间后,logstash会插入至少3倍的数据并且不会停止。
最奇怪的是我尝试生成每条消息的sha1签名并将其用作document_id以避免重复
input {
jdbc {
jdbc_driver_library => "/app/bin/mysql-connector-java-5.1.37-bin.jar"
jdbc_driver_class => "com.mysql.jdbc.Driver"
jdbc_connection_string => "jdbc:mysql://database.xxxxxxx.us-west-2.rds.amazonaws.com:3306/test"
jdbc_page_size => 25000
jdbc_paging_enabled => true
statement => "SELECT * FROM Actions"
}
}
filter {
ruby {
code => "
require 'digest/sha1';
event['fingerprint'] = Digest::SHA1.hexdigest(event.to_json);
"
}
}
output {
elasticsearch {
hosts => ["elasticbeanstalk-env:80"]
index => "test"
document_type => "action"
document_id => "%{fingerprint}"
}
}