避免通过多个选择语句进行联接-Logstash

时间:2019-06-17 13:03:48

标签: elasticsearch logstash logstash-jdbc

我正在使用Logstash将数据从mysql迁移到elasticsearch。我的mysql数据库有一个名为product的主表,该表具有很多关系,要选择的查询包含大约46个左外部联接,返回的结果对于一条记录来说是非常大的(50k)行。因此,我计划将查询分为多个选择。我使用了Logstash的 jdbc_streaming 插件。但是,我想知道我的解决方案是否合乎逻辑吗?

这是简单的配置文件,描述了我的实现(并非针对所有关系):

input {
    jdbc {
       jdbc_driver_library => "mysql-connector-java-5.1.47-bin.jar"
       jdbc_driver_class => "com.mysql.jdbc.Driver"
       jdbc_connection_string => "jdbc:mysql://localhost:3306/my-conn?useSSL=false&allowPublicKeyRetrieval=true"
       jdbc_user => "root"
       jdbc_password => "root"
       schedule => "* * * * *"
       #jdbc_paging_enabled => true
       #jdbc_page_size => 10000
       #jdbc_fetch_size => 5
       statement => "select product_id from product"
       clean_run => false
    }
 }
filter {

   jdbc_streaming {
      jdbc_driver_library => "mysql-connector-java-5.1.47-bin.jar"
      jdbc_driver_class => "com.mysql.jdbc.Driver"
      jdbc_connection_string => "jdbc:mysql://localhost:3306/my-conn?useSSL=false&allowPublicKeyRetrieval=true"
      jdbc_user => "root"
      jdbc_password => "root"
      statement => "select * from product_translation where product_id = :id"
      parameters => { "id" =>"product_id"}
      target => "productTranslations"
}
aggregate {
     task_id => "%{product_id}"
     code => "
        require 'C:\elasticsearch\replication\product-replication\demo.rb' ; 
        ProductMapping.startAggregate(event, map)
     "
     push_previous_map_as_event => true
     timeout => 5
     timeout_tags => ["aggregate"]
 }

 if "aggregate" not in [tags] {
    drop{}
 }
}
output {
elasticsearch { 
    hosts => ["localhost:9200"]
    document_id => "%{productId}"
    document_type => "product"
    index => "test-products"
}
stdout { codec => rubydebug }
}

0 个答案:

没有答案