我正在使用SAS,并且有一张看起来像这样的表
ID | Time | Main | lag_1 | lag_2
----------------------------------------------------------------------------
A | 01 | 0 | 0 | 1
A | 03 | 0 | 0 | 1
A | 04 | 0 | 0 | 0
A | 10 | 1 | 0 | 0
A | 11 | 1 | 0 | 0
A | 12 | 1 | 0 | 0
B | 02 | 1 | 1 | 1
B | 04 | 0 | 1 | 1
B | 07 | 0 | 0 | 1
B | 10 | 1 | 0 | 0
B | 11 | 1 | 0 | 0
B | 12 | 1 | 0 | 0
,除非有多个ID。该表按ID和时间排序。在“主要”列(称为 tot )中计算完总数之后,我尝试计算2件事:
期望计算表会告诉我
tot | tot_1 | tot_2
--------------------
7 | 3 | 6
因为 tot_1 应该为3(ID等于A时为0 + ID = B等于3),而 tot_2 应该为6(ID从A = 3时等于ID + A + 3 ID = B)。
在这些细分类型方面,我是一个完整的初学者,因此非常感谢您的帮助。
编辑:我希望tot_2> = tot_1,因为lag_2是基于Main的事件构建的,这些事件的返回时间比lag_1更长。
答案 0 :(得分:4)
在数据步骤中执行起来要容易得多。这样,您可以检查新ID的开始,并重置lag_x变量是否为true的标志。
data want ;
set have end=eof;
by id time ;
tot + main ;
if first.id then call missing(any_lag_1,any_lag_2);
if any_lag_1 then tot_1 + main ;
if any_lag_2 then tot_2 + main ;
if eof then output;
any_lag_1+lag_1;
any_lag_2+lag_2;
keep tot: ;
run;
答案 1 :(得分:1)
如果我理解正确,则您希望每个ID包含这些金额。关键是比较不同情况下id的最小值,然后求和。这就是所有条件聚合:
select sum(tot) as tot,
sum(case when id_lag_1 < id_main then tot else 0 end) as tot_1,
sum(case when id_lag_2 < id_main then tot else 0 end) as tot_2
from (select id, sum(main) as tot,
min(case when main = 1 then id end) as id_main,
min(case when lag_1 = 1 then id end) as id_lag_1,
min(case when lag_2 = 1 then id end) as id_lag_2
from t
group by id
) t;
答案 2 :(得分:0)
考虑tot_1和tot_2的计算
我的第一步是寻找一个模式,其中lag_1> main(这满足了您提到的情况,即在main = 1之前的某个时间找到lag_1 = 1的记录),我将所有此类值都命名为'grp_lag_1'和' grp_lag_2'
一旦将记录分组,我将使用max()over(按id,time1排序)“复制”值。
select *
,max(case when lag_1 > main then 'grp_lag_1' end) over(partition by id order by id,time1) as grp_1
,max(case when lag_2 > main then 'grp_lag_2' end) over(partition by id order by id,time1) as grp_2
from t
所以我得到如下结果
+----+-------+------+-------+-------+-----------+-----------+
| id | time1 | main | lag_1 | lag_2 | grp_1 | grp_2 |
+----+-------+------+-------+-------+-----------+-----------+
| A | 01 | 0 | 0 | 1 | | grp_lag_2 |
| A | 03 | 0 | 0 | 1 | | grp_lag_2 |
| A | 04 | 0 | 0 | 0 | | grp_lag_2 |
| A | 10 | 1 | 0 | 0 | | grp_lag_2 |
| A | 11 | 1 | 0 | 0 | | grp_lag_2 |
| A | 12 | 1 | 0 | 0 | | grp_lag_2 |
| B | 02 | 1 | 1 | 1 | | |
| B | 04 | 0 | 1 | 1 | grp_lag_1 | grp_lag_2 |
| B | 07 | 0 | 0 | 1 | grp_lag_1 | grp_lag_2 |
| B | 10 | 1 | 0 | 0 | grp_lag_1 | grp_lag_2 |
| B | 11 | 1 | 0 | 0 | grp_lag_1 | grp_lag_2 |
| B | 12 | 1 | 0 | 0 | grp_lag_1 | grp_lag_2 |
+----+-------+------+-------+-------+-----------+-----------+
在此之后,如果我要对grp_lag_1的主要值求和,我将得到tot_1,同样对grp + lag_2求和,我将得到tot_2
select sum(main) as tot_cnt
,sum(case when grp_1='grp_lag_1' then main end) as tot_1
,sum(case when grp_2='grp_lag_2' then main end) as tot_2
from(
select *
,max(case when lag_1 > main then 'grp_lag_1' end) over(partition by id order by id,time1) as grp_1
,max(case when lag_2 > main then 'grp_lag_2' end) over(partition by id order by id,time1) as grp_2
from t
)x
+---------+-------+-------+
| tot_cnt | tot_1 | tot_2 |
+---------+-------+-------+
| 7 | 3 | 6 |
+---------+-------+-------+
演示 https://dbfiddle.uk/?rdbms=sqlserver_2012&fiddle=c17be111dbc3c516afa2bc3dcd3c9e1c