我有一个包含以下列和数据的数据集:
Customer | Week_number | Amount
cust1 | 0 | 100
cust1 | 1 | 200
cust1 | 3 | 300
cust2 | 0 | 1000
cust2 | 1 | 2000
我需要计算每位客户每两周的总计。
借助窗口功能,我可以做到这一点:
SELECT
CUSTOMER, WEEK_NUMBER
, SUM(AMOUNT) OVER (PARTITION BY CUSTOMER ORDER BY WEEK_NUMBER ROWS 1 PRECEDING) AS FORTNIGHT_AMOUNT
FROM AMOUNT
但是,即使前一周没有金额,这也会加起来。在上面的示例中,对于cust1,第3行,它累加了第3周和第1周。只有在week_number比当前行的周少1的情况下,才应添加金额。这可能吗?感谢您的帮助。
我得到的是什么
Customer | Week_number | Fortnight_Amount
cust1 | 0 | 100
cust1 | 1 | 300
cust1 | 3 | **500**
cust2 | 0 | 1000
cust2 | 1 | 3000
预期结果:
Customer | Week_number | Fortnight_Amount
cust1 | 0 | 100
cust1 | 1 | 300
cust1 | 3 | **300**
cust2 | 0 | 1000
cust2 | 1 | 3000
答案 0 :(得分:1)
如果只有两周/行,您的查询可以在Explain中进一步简化为一个STATS步骤(因为两个OLAP函数都应用相同的PARTITION / ORDER):
SELECT T.*
, CASE
WHEN MAX(WEEK_NUMBER) OVER (PARTITION BY CUSTOMER ORDER BY WEEK_NUMBER ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING) + 1 = WEEK_NUMBER
THEN SUM(AMOUNT) OVER (PARTITION BY CUSTOMER ORDER BY WEEK_NUMBER ROWS BETWEEN 1 PRECEDING AND CURRENT ROW)
ELSE AMOUNT
END AS TWO_WEEK_SUM_AMOUNT
FROM MY_TABLE T
ORDER BY CUSTOMER, WEEK_NUMBER
当然,这是假设周从0开始,而没有上一年的第52/53周。
答案 1 :(得分:0)
如果您只想忽略不是立即连续的周数,则可以先使用lag()
,然后使用窗口sum()
:
select
customer,
week_number,
sum(
case when lag_week_number is null or week_number = lag_week_number + 1
then amount
else 0
end
) over(partition by customer order by week_number) fortnight_amount
from (
select
t.*,
lag(week_number) over(partition by customer order by week_number) lag_week_number
from mytable t
) t
实际上,当week_numbers中存在间隔时,您实际上可能想重置 sum
。为此,这是某种差距和孤岛的分配,您将以不同的方式进行操作:想法是当两个连续的星期数ae连续出现时,进行累积sum
来开始一个新的组,然后求和每组:
select
customer,
week_number,
sum(amount) over(partition by customer, grp order by week_date) fortnight_amount
from (
select
t.*,
sum(
case
when lag_week_number is null or week_number = lag_week_number + 1
then 0
else 1
end
) grp
from (
select
t.*,
lag(week_number) over(partition by customer order by week_number) lag_week_number
from mytable t
) t
) t
答案 2 :(得分:0)
您要range
分区,而不是row
分区:
SELECT CUSTOMER, WEEK_NUMBER,
SUM(AMOUNT) OVER (PARTITION BY CUSTOMER
ORDER BY WEEK_NUMBER
RANGE BETWEEN 1 PRECEDING AND CURRENT ROW
) AS FORTNIGHT_AMOUNT
FROM AMOUNT;
答案 3 :(得分:0)
感谢@Gordon和@GMB的回答。不幸的是,我无法同时使用Teradata SQL中的LAG函数或RANGE分区。但是我能够使用你们俩描述的概念来获得以下答案。
SELECT
CUSTOMER
, WEEK_NUMBER
, LAG_WEEK_NUMBER
, AMOUNT
, CASE
WHEN WEEK_NUMBER = LAG_WEEK_NUMBER + 1
THEN SUM(AMOUNT) OVER (PARTITION BY CUSTOMER ORDER BY WEEK_NUMBER ROWS BETWEEN 1 PRECEDING AND CURRENT ROW)
ELSE AMOUNT
END AS TWO_WEEK_SUM_AMOUNT
FROM (
SELECT
T.*
, MAX(WEEK_NUMBER) OVER (PARTITION BY CUSTOMER ORDER BY WEEK_NUMBER ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING) AS LAG_WEEK_NUMBER
FROM MY_TABLE T
) T
ORDER BY CUSTOMER, WEEK_NUMBER
我能够从以下链接中@dnoeth的答案中获得Teradata中的LAG函数实现:
MAX(WEEK_NUMBER) OVER (PARTITION BY CUSTOMER ORDER BY WEEK_NUMBER ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING) AS LAG_WEEK_NUMBER
rows between 1 preceding and preceding 1
Teradata partitioned query ... following rows dynamically
如果您发现答案有任何问题或可以通过任何方式加以改进,请告诉我。