我有一个在增量模式下工作的jdbc源连接器。但是我发现它非常慢。所以我正在使用的查询有一个联接和一个组,通过它非常昂贵,最重要的是,kafka connect通过id添加过滤器,并通过id排序。
问题是,kafka connect在group by之后添加了where子句,但这并不有效,我想知道有没有一种方法可以对其进行优化?
这是我提供给Kafka-Connect的查询
select * FROM (
SELECT
tabl1.hash_key,
MAX(tabl1.id) AS id,
MAX(tabl1.canonical_url_id) AS canonical_url_id,
SUM(tabl2.some_val) AS some_val
FROM table2
JOIN tabl1
ON kms_articles.hash_key = table2.hash_key AND canonical_url_id IS NOT NULL
GROUP BY tabl1.hash_key) AS FOO
结果变为:
select * FROM (
SELECT
tabl1.hash_key,
MAX(tabl1.id) AS id,
MAX(tabl1.canonical_url_id) AS canonical_url_id,
SUM(tabl2.some_val) AS some_val
FROM table2
JOIN tabl1
ON kms_articles.hash_key = table2.hash_key AND canonical_url_id IS NOT NULL
GROUP BY tabl1.hash_key) AS FOO
WHERE "id" > $1 ORDER BY "id" ASC
现在这不是很有效,我希望可以对其进行修改以在内部查询中也具有where位置。请参阅以下内容:
select * FROM (
SELECT
tabl1.hash_key,
MAX(tabl1.id) AS id,
MAX(tabl1.canonical_url_id) AS canonical_url_id,
SUM(tabl2.some_val) AS some_val
FROM table2
JOIN tabl1
ON kms_articles.hash_key = table2.hash_key
AND canonical_url_id IS NOT NULL
WHERE "id" > $1
GROUP BY tabl1.hash_key) AS FOO
WHERE "id" > $2 ORDER BY "id" ASC
会更快
我想知道是否有一种解决方法可以在jdbc语句中添加另一个变量?
还是kafka connect中的解决方法?
目前,我正计划扩展此类并添加此特定功能