SQL查询基于不同列的先前值对行进行计数

时间:2018-11-22 14:57:20

标签: sql sas conditional

我正在使用SAS,并且有一张看起来像这样的表

ID | Time | Main | lag_1 | lag_2
----------------------------------------------------------------------------
A  |  01  |   0  |   0   |  1  
A  |  03  |   0  |   0   |  1  
A  |  04  |   0  |   0   |  0  
A  |  10  |   1  |   0   |  0  
A  |  11  |   1  |   0   |  0  
A  |  12  |   1  |   0   |  0  
B  |  02  |   1  |   1   |  1  
B  |  04  |   0  |   1   |  1  
B  |  07  |   0  |   0   |  1  
B  |  10  |   1  |   0   |  0  
B  |  11  |   1  |   0   |  0  
B  |  12  |   1  |   0   |  0  

,除非有多个ID。该表按ID和时间排序。在“主要”列(称为 tot )中计算完总数之后,我尝试计算2件事:

  1. 仅当lag_1在之前变为1时说lag_1等于1,Main列中的总数。和
  2. 与1相同。但是在这种情况下,对于lag_2,请调用变量 tot_2

期望计算表会告诉我

tot | tot_1 | tot_2
--------------------
 7  |   3   |   6

因为 tot_1 应该为3(ID等于A时为0 + ID = B等于3),而 tot_2 应该为6(ID从A = 3时等于ID + A + 3 ID = B)。

在这些细分类型方面,我是一个完整的初学者,因此非常感谢您的帮助。

编辑:我希望tot_2> = tot_1,因为lag_2是基于Main的事件构建的,这些事件的返回时间比lag_1更长。

3 个答案:

答案 0 :(得分:4)

在数据步骤中执行起来要容易得多。这样,您可以检查新ID的开始,并重置lag_x变量是否为true的标志。

data want ;
  set have end=eof;
  by id time ;
  tot + main ;
  if first.id then call missing(any_lag_1,any_lag_2);
  if any_lag_1 then tot_1 + main ;
  if any_lag_2 then tot_2 + main ;
  if eof then output;
  any_lag_1+lag_1;
  any_lag_2+lag_2;
  keep tot: ;
run;

答案 1 :(得分:1)

如果我理解正确,则您希望每个ID包含这些金额。关键是比较不同情况下id的最小值,然后求和。这就是所有条件聚合:

select sum(tot) as tot,
       sum(case when id_lag_1 < id_main then tot else 0 end) as tot_1,
       sum(case when id_lag_2 < id_main then tot else 0 end) as tot_2
from (select id, sum(main) as tot,
             min(case when main = 1 then id end) as id_main,
             min(case when lag_1 = 1 then id end) as id_lag_1,
             min(case when lag_2 = 1 then id end) as id_lag_2
      from t 
      group by id
     ) t;

答案 2 :(得分:0)

考虑tot_1和tot_2的计算

我的第一步是寻找一个模式,其中lag_1> main(这满足了您提到的情况,即在main = 1之前的某个时间找到lag_1 = 1的记录),我将所有此类值都命名为'grp_lag_1'和' grp_lag_2'

一旦将记录分组,我将使用max()over(按id,time1排序)“复制”值。

select *
      ,max(case when lag_1 > main then 'grp_lag_1' end) over(partition by id order by id,time1) as grp_1 
      ,max(case when lag_2 > main then 'grp_lag_2' end) over(partition by id order by id,time1) as grp_2 
  from t

所以我得到如下结果

+----+-------+------+-------+-------+-----------+-----------+
| id | time1 | main | lag_1 | lag_2 |   grp_1   |   grp_2   |
+----+-------+------+-------+-------+-----------+-----------+
| A  |    01 |    0 |     0 |     1 |           | grp_lag_2 |
| A  |    03 |    0 |     0 |     1 |           | grp_lag_2 |
| A  |    04 |    0 |     0 |     0 |           | grp_lag_2 |
| A  |    10 |    1 |     0 |     0 |           | grp_lag_2 |
| A  |    11 |    1 |     0 |     0 |           | grp_lag_2 |
| A  |    12 |    1 |     0 |     0 |           | grp_lag_2 |
| B  |    02 |    1 |     1 |     1 |           |           |
| B  |    04 |    0 |     1 |     1 | grp_lag_1 | grp_lag_2 |
| B  |    07 |    0 |     0 |     1 | grp_lag_1 | grp_lag_2 |
| B  |    10 |    1 |     0 |     0 | grp_lag_1 | grp_lag_2 |
| B  |    11 |    1 |     0 |     0 | grp_lag_1 | grp_lag_2 |
| B  |    12 |    1 |     0 |     0 | grp_lag_1 | grp_lag_2 |
+----+-------+------+-------+-------+-----------+-----------+

在此之后,如果我要对grp_lag_1的主要值求和,我将得到tot_1,同样对grp + lag_2求和,我将得到tot_2

 select sum(main) as tot_cnt
       ,sum(case when grp_1='grp_lag_1' then main end) as tot_1
       ,sum(case when grp_2='grp_lag_2' then main end) as tot_2
 from(      
select *
      ,max(case when lag_1 > main then 'grp_lag_1' end) over(partition by id order by id,time1) as grp_1 
      ,max(case when lag_2 > main then 'grp_lag_2' end) over(partition by id order by id,time1) as grp_2 
  from t
  )x


+---------+-------+-------+
| tot_cnt | tot_1 | tot_2 |
+---------+-------+-------+
|       7 |     3 |     6 |
+---------+-------+-------+

演示 https://dbfiddle.uk/?rdbms=sqlserver_2012&fiddle=c17be111dbc3c516afa2bc3dcd3c9e1c