嘿伙计们假设我有一个数据框
Year Month 1_month_sub 3_month_sub 12_month_sub
2014 1 3 1 1
2014 2 1 0 0
2014 3 1 0 0
2014 4 1 0 0
2014 5 4 0 0
2014 6 1 0 0
2014 7 5 0 0
2014 8 1 0 0
2014 9 1 0 0
2014 10 6 0 0
2014 11 1 0 0
2014 12 3 0 0
如果1_month sub表示购买了1个月订阅,则3个月sub表示购买了3个月订阅等。 我需要添加一个列,在任何给定的单位时间内为我提供每月订阅者数量。因此结果如下:
Year Month 1_month_sub 3_month_sub 12_month_sub subs
2014 1 3 1 1 5
2014 2 1 0 0 3
2014 3 1 0 0 3
2014 4 1 0 0 2
2014 5 4 0 0 5
2014 6 1 0 0 2
2014 7 5 0 0 6
2014 8 1 0 0 2
2014 9 1 0 0 2
2014 10 6 0 0 7
2014 11 1 0 0 2
2014 12 3 0 0 4
2015 1 1 0 0 1
我使用了COALESCE,LAG,LEAD功能并没有取得真正的成功。关于如何处理这个问题的任何想法?
答案 0 :(得分:2)
我推测数据在一个表格中,1个月只存在一个月,3个月为3个月,12个月为12个月。
而且,我将假设每个月都有一行。
您可以在Postgres中使用累积和的窗口子句来执行此操作:
select t.*,
(1_month_sub +
sum(3_month_sub) over (order by year rows between 2 preceding and current row) +
sum(12_month_sub) over (order by year rows between 11 preceding and current row)
) as total_subs
from t;
答案 1 :(得分:0)
你尝试过这样的事吗?
编辑: @Gordon Linoff下面的答案更好 - 同样的想法,但在单个表达式中封装了“前3个”和“前12个”值!
select subs=(1_month_sub) +
lag(3_month_sub, 2, 0) over (order by Year, Month) +
lag(3_month_sub, 1, 0) over (order by Year, Month) +
3_month_sub +
lag(12_month_sub, 11, 0) over (order by Year, Month) +
lag(12_month_sub, 10, 0) over (order by Year, Month) +
lag(12_month_sub, 9, 0) over (order by Year, Month) +
lag(12_month_sub, 8, 0) over (order by Year, Month) +
lag(12_month_sub, 7, 0) over (order by Year, Month) +
lag(12_month_sub, 6, 0) over (order by Year, Month) +
lag(12_month_sub, 5, 0) over (order by Year, Month) +
lag(12_month_sub, 4, 0) over (order by Year, Month) +
lag(12_month_sub, 3, 0) over (order by Year, Month) +
lag(12_month_sub, 2, 0) over (order by Year, Month) +
lag(12_month_sub, 1, 0) over (order by Year, Month) +
12_month_sub
from MyTable
答案 2 :(得分:0)
没有电源功能就可以做到这一点。下面已经在PostgreSQL和MS SQL上进行了测试。
请参阅SQL Fiddle工作原理:http://sqlfiddle.com/#!15/74862/4/0
简单SQL加入
select
t1.Year,
t1.Month,
sum(case when ((t2.Year-2014)*12+t2.Month) <= ((t1.Year-2014)*12+t1.Month) and ((t2.Year-2014)*12+t2.Month) - ((t1.Year-2014)*12+t1.Month) > -1 then 1 else 0 end * t2.one_ms) +
sum( case when ((t2.Year-2014)*12+t2.Month) <= ((t1.Year-2014)*12+t1.Month) and ((t2.Year-2014)*12+t2.Month) - ((t1.Year-2014)*12+t1.Month) > -3 then 1 else 0 end * t2.three_ms ) +
sum( case when ((t2.Year-2014)*12+t2.Month) <= ((t1.Year-2014)*12+t1.Month) and ((t2.Year-2014)*12+t2.Month) - ((t1.Year-2014)*12+t1.Month) > -12 then 1 else 0 end * t2.twelve_ms ) as subs
from Test t1
join Test t2
on 1=1
group by t1.Year, t1.Month, ((t1.Year-2014)*12+t1.Month)
order by ((t1.Year-2014)*12+t1.Month)
以及笛卡尔积的以下特征函数:
1 2 3 4 5
+---------+
1|O . . . .|
2|O O . . .|
3|O O O . .|
4|. O O O .|
5|. . O O O|
+---------+
工作:
year month subs
2014 1 5
2014 2 3
2014 3 3
2014 4 2
2014 5 5
2014 6 2
2014 7 6
2014 8 2
2014 9 2
2014 10 7
2014 11 2
2014 12 4
2015 1 1
为了更好地理解它,您可能希望为(t1.Year-2014)*12+t1.Month
提供别名,例如num
:
alter table Test add column num int NULL
update Test
set num = (Year-2014)*12+Month
select
t1.Year,
t1.Month,
sum(case when t2.num <= t1.num and t2.num - t1.num > -1 then 1 else 0 end * t2.one_ms) +
sum( case when t2.num <= t1.num and t2.num - t1.num > -3 then 1 else 0 end * t2.three_ms ) +
sum( case when t2.num <= t1.num and t2.num - t1.num > -12 then 1 else 0 end * t2.twelve_ms ) as subs
from Test t1
join Test t2
on 1=1
group by t1.Year, t1.Month, t1.num
order by t1.num