我应该在连接条件或先前的CTE中放置行号过滤器吗?

时间:2017-10-17 22:58:42

标签: sql etl query-performance impala

我有一个subscription表和一个payments表,我需要加入。 我试图在两个选项和性能之间做出决定是一个关键考虑因素。

以下两个OPTIONS中的哪一个会表现得更好?

我正在使用Impala,这些表很大(数百万行)我只需要为每个iddate分组获取一行(因此row_number()分析功能)。

我缩短了查询以说明我的问题:

选项1:

WITH cte
   AS (
   SELECT *
      , SUM(amount) OVER (PARTITION BY id, date) 
        AS sameday_total
      , ROW_NUMBER() OVER (PARTITION BY id, date ORDER BY purchase_number DESC)
        AS sameday_rownum
   FROM payments
), 
payment
AS (
    SELECT * 
    FROM cte
    WHERE sameday_rownum = 1
    )
    SELECT s.* 
       , p.sameday_total
    FROM subscription
    INNER JOIN payment ON s.id = p.id

选项2:

WITH payment
   AS (
   SELECT *
          , SUM(payment_amount) OVER (PARTITION BY id, date) 
            AS sameday_total
          , ROW_NUMBER() OVER (PARTITION BY id, date ORDER BY purchase_number DESC)
            AS sameday_rownum
   FROM payments
)
SELECT s.*
       , p.sameday_total
FROM subscription
INNER JOIN payment ON s.id = p.id
                  AND p.sameday_rownum = 1

1 个答案:

答案 0 :(得分:1)

" 选项0 "也存在。一个更传统的派生表"这根本不需要使用任何CTE。

SELECT s.*
       , p.sameday_total
FROM subscription
INNER JOIN (
           SELECT *
             , SUM(payment_amount) OVER (PARTITION BY id, date) 
                 AS sameday_total
             , ROW_NUMBER() OVER (PARTITION BY id, date ORDER BY purchase_number DESC)
                AS sameday_rownum
           FROM payments
           ) p ON s.id = p.id
                  AND p.sameday_rownum = 1

所有选项0,1和2都可能产生相同或非常相似的解释计划(尽管我对SQL Server的语句比对Impala更有信心)。

采用CTE本身 - 不会使查询更有效或更好地执行,因此选项1和2之间的语法更改不是主要的。我更喜欢自己选择0,因为我更喜欢将CTE用于特定任务(例如递归)。

你应该做的是use explain plans来研究每个选项产生的内容。