Logstash产生重复

时间:2015-11-22 04:34:58

标签: mysql elasticsearch logstash elastic-beanstalk logstash-configuration

我的目标是从ElasticSearch索引中的MySQL表导入数据。 MySQL表有大约250万条记录,但是一段时间后,logstash会插入至少3倍的数据并且不会停止。

最奇怪的是我尝试生成每条消息的sha1签名并将其用作document_id以避免重复

input {
  jdbc {
    jdbc_driver_library => "/app/bin/mysql-connector-java-5.1.37-bin.jar"
    jdbc_driver_class => "com.mysql.jdbc.Driver"
    jdbc_connection_string => "jdbc:mysql://database.xxxxxxx.us-west-2.rds.amazonaws.com:3306/test"
    jdbc_page_size => 25000
    jdbc_paging_enabled => true
    statement => "SELECT * FROM Actions"
  }
}

filter {
  ruby {
    code => "
      require 'digest/sha1';
      event['fingerprint'] = Digest::SHA1.hexdigest(event.to_json);
    "
  }
}

output {
  elasticsearch {
    hosts => ["elasticbeanstalk-env:80"]
    index => "test"
    document_type => "action"
    document_id => "%{fingerprint}"
  }
}

0 个答案:

没有答案