Kafka-connect jdbc-source通过group by优化查询

时间:2019-07-24 12:46:06

标签: jdbc apache-kafka-connect

我有一个在增量模式下工作的jdbc源连接器。但是我发现它非常慢。所以我正在使用的查询有一个联接和一个组,通过它非常昂贵,最重要的是,kafka connect通过id添加过滤器,并通过id排序。

问题是,kafka connect在group by之后添加了where子句,但这并不有效,我想知道有没有一种方法可以对其进行优化?

这是我提供给Kafka-Connect的查询

select * FROM (
            SELECT
              tabl1.hash_key,
              MAX(tabl1.id) AS id,
              MAX(tabl1.canonical_url_id) AS canonical_url_id,
              SUM(tabl2.some_val) AS some_val
            FROM table2
              JOIN  tabl1
                ON kms_articles.hash_key = table2.hash_key AND canonical_url_id IS NOT NULL
            GROUP BY tabl1.hash_key)  AS FOO

结果变为:

select * FROM (
        SELECT
          tabl1.hash_key,
          MAX(tabl1.id) AS id,
          MAX(tabl1.canonical_url_id) AS canonical_url_id,
          SUM(tabl2.some_val) AS some_val
        FROM table2
          JOIN  tabl1
            ON kms_articles.hash_key = table2.hash_key AND canonical_url_id IS NOT NULL
        GROUP BY tabl1.hash_key) AS FOO
        WHERE "id" > $1 ORDER BY "id" ASC

现在这不是很有效,我希望可以对其进行修改以在内部查询中也具有where位置。请参阅以下内容:

select * FROM (
            SELECT
              tabl1.hash_key,
              MAX(tabl1.id) AS id,
              MAX(tabl1.canonical_url_id) AS canonical_url_id,
              SUM(tabl2.some_val) AS some_val
            FROM table2
              JOIN  tabl1
                ON kms_articles.hash_key = table2.hash_key 
                AND  canonical_url_id IS NOT NULL 
               WHERE "id" > $1
              GROUP BY tabl1.hash_key) AS FOO  
              WHERE "id" > $2 ORDER BY "id" ASC

会更快

我想知道是否有一种解决方法可以在jdbc语句中添加另一个变量?

还是kafka connect中的解决方法?

目前,我正计划扩展此类并添加此特定功能

0 个答案:

没有答案