我试图提高SQL查询的性能并尝试了几种组合。
原始查询
SELECT ALIAS_A.id1,
ALIAS_A.id2,
ALIAS_B.columnA,
ALIAS_C.columnB,
ALIAS_B.columnC
FROM db_A.table_A ALIAS_A
LEFT OUTER JOIN db_A.table_B ALIAS_B
ON ALIAS_A.id2 = ALIAS_B.id2
LEFT OUTER JOIN db_B.table_C ALIAS_C
ON ALIAS_B.columnA = ALIAS_C.item_num
LEFT OUTER JOIN db_A.table_D ALIAS_D
ON ALIAS_A.id2 = ALIAS_D.id2
INNER JOIN db_C.table_E ALIAS_E
ON Cast(ALIAS_A.column_date AS DATE) BETWEEN
ALIAS_E.column_startdate AND ALIAS_E.column_enddate
WHERE ALIAS_E.fiscalyear >= 2016
AND Cast(ALIAS_A.columnD AS DATE) BETWEEN
CURRENT_DATE - 5 AND CURRENT_DATE
以上查询消耗了近400k impactCPU
优化查询1
SELECT New_sub_table.id1,
New_sub_table.id2,
ALIAS_B.columnA,
ALIAS_C.columnB,
ALIAS_B.columnC
--changed part start--
FROM ( sel * from db_A.table_A ALIAS_A WHERE Cast(ALIAS_A.columnD AS DATE) BETWEEN
CURRENT_DATE - 5 AND CURRENT_DATE ) New_sub_table -- created a subquery
--changed part end--
LEFT OUTER JOIN db_A.table_B ALIAS_B
ON New_sub_table.id2 = ALIAS_B.id2
LEFT OUTER JOIN db_B.table_C ALIAS_C
ON ALIAS_B.columnA = ALIAS_C.item_num
LEFT OUTER JOIN db_A.table_D ALIAS_D
ON New_sub_table.id2 = ALIAS_D.id2
INNER JOIN db_C.table_E ALIAS_E
ON Cast(New_sub_table.column_date AS DATE) BETWEEN
ALIAS_E.column_startdate AND ALIAS_E.column_enddate
WHERE ALIAS_E.fiscalyear >= 2016
我想先过滤数据然后再进行连接。在我检查了性能统计数据之后。它消耗了近390k的CPU。没什么区别。
优化查询2
SELECT ALIAS_A.id1,
ALIAS_A.id2,
ALIAS_B.columnA,
ALIAS_C.columnB,
ALIAS_B.columnC
--changed part start--
FROM INTERMEDIATE_DB.INTERMEDIATE_TABLE ALIAS_A --CREATED AN INTERMEDIATE TABLE
--changed part end--
LEFT OUTER JOIN db_A.table_B ALIAS_B
ON ALIAS_A.id2 = ALIAS_B.id2
LEFT OUTER JOIN db_B.table_C ALIAS_C
ON ALIAS_B.columnA = ALIAS_C.item_num
LEFT OUTER JOIN db_A.table_D ALIAS_D
ON ALIAS_A.id2 = ALIAS_D.id2
INNER JOIN db_C.table_E ALIAS_E
ON Cast(ALIAS_A.column_date AS DATE) BETWEEN
ALIAS_E.column_startdate AND ALIAS_E.column_enddate
WHERE ALIAS_E.fiscalyear >= 2016
MACRO用于将数据加载到中间表
INSERT INTO INTERMEDIATE_DB.INTERMEDIATE_TABLE
sel * from db_A.table_A ALIAS_A WHERE Cast(ALIAS_A.columnD AS DATE) BETWEEN
CURRENT_DATE - 5 AND CURRENT_DATE
所以我在这里做的是。我使用了一个中间表而不是子查询。首先通过宏加载中间表,然后运行select查询。它现在只消耗50k impactCPU(对于Macro和Select查询组合)。
我的问题 - 即使两个查询背后的逻辑相同(或者我认为是这样),我也无法解释为什么会发生这种情况。如果这是不正确的方法,最佳做法是什么?
答案 0 :(得分:1)
您的主要问题是Cast(ALIAS_A.columnD AS DATE)
。当您检查Explains时,您会注意到此步骤的优化器没有置信度,可能会大大高估返回的行数。
但是当你实现选择时,行数已经更好,并且连接的顺序也会改变。
当您在Cast(ALIAS_A.columnD AS DATE)
上收集统计信息时,您可能会获得相同的计划,运行DIAGNOSTIC HELPSTATS ON FOR SESSION;
并且Explain应该将此显示为推荐的统计信息。