我有一个包含4列的表格。看起来像这样。
+-------------------------------------------------------------+
| ID | date | a1 | a2 |
+-------------------------------------------------------------+
|1 | 31AUG2015 | 100 | 70
+-------------------------------------------------------------+
|1 | 01SEPT2015| 150 | 80
+-------------------------------------------------------------+
|2 | 31AUG2015 | 900 | 0
+-------------------------------------------------------------+
|2 | 01SEPT2015| 150 | 100
+-------------------------------------------------------------+
我想为该日期之前和直到该日期的所有行添加a1和a2,所以我有更多类似这样的内容:
+-------------------------------------------------------------+
| ID | date | a1 | a2 |
+-------------------------------------------------------------+
|1 | 31AUG2015 | 100 | 70
+-------------------------------------------------------------+
|1 | 01SEPT2015| 250 | 150
+-------------------------------------------------------------+
|2 | 31AUG2015 | 900 | 0
+-------------------------------------------------------------+
|2 | 01SEPT2015| 1050 | 100
+-------------------------------------------------------------+
这是我的尝试。截至该日期的自我加入:
proc sql;
create table want as
select
a.id
,a.date
,sum(a.a1)
,sum(a.a2)
from
have a,
have b
where
a.id = b.id and
a.dt <= b.dt
group by
a.id
,a.date
quit;
结果一团糟,没有任何东西符合我的期望。我确定我在某个地方犯了个大错误,但是希望对proc sql或数据步骤的修复提供一些指导。
答案 0 :(得分:3)
对于这种类型的逻辑,数据步骤要容易得多。这将在新变量a3中创建值-该变量已重命名,当前已注释掉,以便您查看逻辑并进行验证。
data want /*(rename= (a3=a2)) */;
set have;
by ID date; *assumes correct ordering of data;
if first.id then a3 = a2;
else a3 + a2;
*drop a2;
run;
First.ID重置累计总数,否则它将继续使用a3 + a2相加。 这被称为SUM语句,其中对变量a3隐含了RETAIN,这意味着该值跨行保留。
答案 1 :(得分:3)
数据步骤将使其变得容易。按ID和日期对它进行排序,然后使用sum语句累计值。
proc sort data=have;
by id date;
run;
data want;
set have;
by id date;
/* Reset cumulative sum at the start of each ID */
if(first.id) then call missing(a1_cume, a2_cume);
a1_cume+a1;
a2_cume+a2;
run;
答案 2 :(得分:0)
一种方法是相关子查询:
proc sql;
select h.*,
(select sum(h2.a1)
from have h2
where h2.id = h.id and h2.date <= h.date
) as running_a1,
(select sum(h2.a2)
from have h2
where h2.id = h.id and h2.date <= h.date
) as running_a2
from h2;
也就是说,如果您使用直通SQL,则应使用窗口函数:
sum(a1) over (partition by id order by date)
如果您使用的是SQL,则可能应将数据步骤与retain
一起使用。