我在netezza中有一个100亿行FACT表,我想在一个查询中执行ROW_NUMBER(),MAX()和SUM()。当我做同样的查询执行超过3个小时。是否有任何方法可以提高查询性能。表分布在作为分区子句(COLA,COLB,COLC,COLD)一部分的4列上。
样本示例
SUM(STR_QTY) OVER (
PARTITION BY
COLA
,COLB
,COLC
,COLD
) AS SLS_RTRN_QTY
,
SUM(STR_QTY_1) OVER (
PARTITION BY
COLA
,COLB
,COLC
,COLD
) AS VAL_QTY
,MIN(ITM_FST_DT) OVER (
PARTITION BY COLA
,COLB
) AS FIRST_DT
,MAX(ITM_LST_DT) OVER (
PARTITION BY PARTITION BY COLA
,COLB
) AS LAST_DT
编辑1:原始查询
SELECT a.*
FROM (
SELECT F.DT_KEY AS DT_KEY
,F.COL_KEY AS COL_KEY
,F.PCK_ITM_KEY AS PCK_ITM_KEY
,F.COLC AS COLC
,F.COLD AS COLD
,F.COLA AS COLA
,F.COLB AS COLB
,F.COLC AS COLC
,F.SH_QTY AS SH_QTY
,SUM(F.SLS_QTY) OVER (
PARTITION BY F.COLD
,F.COLA
,F.COLB
,F.COLC
) AS SLS_QTY
,SUM(F.SLS_RTRN_QTY) OVER (
PARTITION BY F.COLD
,F.COLA
,F.COLB
,F.COLC
) AS SLS_RTRN_QTY
,SUM(F.PCHSE_QTY) OVER (
PARTITION BY F.COLD
,F.COLA
,F.COLB
,F.COLC
) AS PCHSE_QTY
,MAX(F.LST_ML_DT) OVER (
PARTITION BY F.COLA
,F.COLC
) AS LST_ML_DT
,F.LST_MODFD_DTTM AS LST_MODFD_DTTM
,ROW_NUMBER() OVER (
PARTITION BY F.COLD
,F.COLA
,F.COLB
,F.COLC
,F.COLE ORDER BY F.DT_KEY DESC
) AS RNK
FROM FCT_ITEM F
) a
WHERE a.RNK = 1;
答案 0 :(得分:0)
此查询将导致整个表在COLA和COLB上重新分发。如果分发列的集合是分区列的子集,那么您将无法获得昂贵的重新分配。
作为一般规则,在分发条款中使用尽可能少的列,同时仍保持相当均匀的分布。 如果单独使用COLA或COLB进行均匀分配,那么请选择其中之一。