Question

我有一个包含4列的表格。看起来像这样。

 +-------------------------------------------------------------+
 |         ID      |      date |  a1      |         a2         |
 +-------------------------------------------------------------+
 |1                | 31AUG2015 | 100      |        70
 +-------------------------------------------------------------+
 |1                | 01SEPT2015| 150      |        80
 +-------------------------------------------------------------+
 |2                | 31AUG2015 | 900      |        0
 +-------------------------------------------------------------+
 |2                | 01SEPT2015| 150      |        100
 +-------------------------------------------------------------+

我想为该日期之前和直到该日期的所有行添加a1和a2，所以我有更多类似这样的内容：

 +-------------------------------------------------------------+
 |         ID      |      date |  a1      |         a2         |
 +-------------------------------------------------------------+
 |1                | 31AUG2015 | 100      |        70
 +-------------------------------------------------------------+
 |1                | 01SEPT2015| 250      |        150
 +-------------------------------------------------------------+
 |2                | 31AUG2015 | 900      |        0
 +-------------------------------------------------------------+
 |2                | 01SEPT2015| 1050     |        100
 +-------------------------------------------------------------+

这是我的尝试。截至该日期的自我加入：

proc sql;
create table want as
select
    a.id
    ,a.date
    ,sum(a.a1)
    ,sum(a.a2)
from 
    have a,
    have b
where 
    a.id = b.id and
    a.dt <=  b.dt

group by
    a.id
    ,a.date

quit;

结果一团糟，没有任何东西符合我的期望。我确定我在某个地方犯了个大错误，但是希望对proc sql或数据步骤的修复提供一些指导。

Answer 1

对于这种类型的逻辑，数据步骤要容易得多。这将在新变量a3中创建值-该变量已重命名，当前已注释掉，以便您查看逻辑并进行验证。

data want /*(rename= (a3=a2)) */;
   set have;
   by ID date; *assumes correct ordering of data;
   if first.id then a3 = a2;
   else a3 + a2;
   *drop a2;
run;

First.ID重置累计总数，否则它将继续使用a3 + a2相加。这被称为SUM语句，其中对变量a3隐含了RETAIN，这意味着该值跨行保留。

Answer 2

数据步骤将使其变得容易。按ID和日期对它进行排序，然后使用sum语句累计值。

proc sort data=have;
     by id date;
run;

data want;
   set have;
   by id date;

   /* Reset cumulative sum at the start of each ID */
   if(first.id) then call missing(a1_cume, a2_cume);

   a1_cume+a1;
   a2_cume+a2;
run;

Answer 3

一种方法是相关子查询：

proc sql;
    select h.*,
           (select sum(h2.a1)
            from have h2
            where h2.id = h.id and h2.date <= h.date
           ) as running_a1,
           (select sum(h2.a2)
            from have h2
            where h2.id = h.id and h2.date <= h.date
           ) as running_a2
    from h2;

也就是说，如果您使用直通SQL，则应使用窗口函数：

sum(a1) over (partition by id order by date)

如果您使用的是SQL，则可能应将数据步骤与retain一起使用。

累计到一个日期

3 个答案: