我想对每个名称(TRD_STCK_CD)和日期(TRD_EVENT_TM)变量的体积变量求和。
以下是我的数据示例:
+--------------+--------------+-------------+--------+------------+---------
| TRD_EVENT_DT | TRD_EVENT_TM | TRD_STCK_CD | TRD_EVENT_ROUFOR | VOLUME |
+--------------+--------------+-------------+--------+------------+---------
| 3/24/2008 | 12:28:01 | ALBZ1 | 12:30 | 15370000 |
| 3/24/2008 | 13:13:44 | ALBZ1 | 13:00 | 15670 |
| 3/24/2008 | 12:20:38 | AZAB1 | 12:30 | 6830000 |
| 3/24/2008 | 13:13:44 | AZAB1 | 13:00 | 6950 |
| 3/24/2008 | 9:14:57 | BALI1 | 9:00 | 7871000 |
| 3/24/2008 | 9:15:06 | BALI1 | 9:30 | 1700000 |
| 3/24/2008 | 9:15:14 | BALI1 | 9:30 | 8500000 |
| 3/24/2008 | 9:15:24 | BALI1 | 9:30 | 5100000 |
| 3/24/2008 | 9:29:27 | BALI1 | 9:30 | 8500000 |
| 3/24/2008 | 12:28:00 | BALIl | 12:30 | 8500000 |
| 3/24/2008 | 12:28:07 | BALIl | 12:30 | 8500000 |
| 3/24/2008 | 13:13:44 | BALI1 | 13:00 | 8650 |
+--------------+--------------+-------------+--------+------------+---------
我删除了一些col。为简单起见。在下一步中,我想要一个如下表:
+--------------+--------------+-------------+--------+------------+---------
| TRD_EVENT_DT | TRD_EVENT_TM | TRD_STCK_CD | TRD_EVENT_ROUFOR | VOLUME | volume_Sum |
+--------------+--------------+-------------+--------+------------+---------
| 3/24/2008 | 12:28:01 | ALBZ1 | 12:30 | 15370000 | |
| 3/24/2008 | 13:13:44 | ALBZ1 | 13:00 | 15670 | 15385670 |
| 3/24/2008 | 12:20:38 | AZAB1 | 12:30 | 6830000 | |
| 3/24/2008 | 13:13:44 | AZAB1 | 13:00 | 6950 | 6836950 |
| 3/24/2008 | 9:14:57 | BALI1 | 9:00 | 7871000 | |
| 3/24/2008 | 9:15:06 | BALI1 | 9:30 | 1700000 | |
| 3/24/2008 | 9:15:14 | BALI1 | 9:30 | 8500000 | |
| 3/24/2008 | 9:15:24 | BALI1 | 9:30 | 5100000 | |
| 3/24/2008 | 9:29:27 | BALI1 | 9:30 | 8500000 | |
| 3/24/2008 | 12:28:00 | BALIl | 12:30 | 8500000 | |
| 3/24/2008 | 12:28:07 | BALIl | 12:30 | 8500000 | |
| 3/24/2008 | 13:13:44 | BALI1 | 13:00 | 8650 | 48679650 |
+--------------+--------------+-------------+--------+------------+---------
请注意最后一栏。它是通过对具有相同TRD_STCK_CD var的卷求和而生成的。所以每个TRD_STCK_CD都是。只有一个Volume_Sum数据。
答案 0 :(得分:3)
同一想法的实现略有不同:
/*Sort by TRD_STCK_CD and temporal variables.*/
proc sort data=have out=have_sorted;
by TRD_STCK_CD
TRD_EVENT_DT
TRD_EVENT_TM;
run;
/*Sum VOLUME until the last of each TRD_STCK_CD is reached.*/
data want;
set have_sorted;
by TRD_STCK_CD
TRD_EVENT_DT
TRD_EVENT_TM;
retain tmp_volume_sum;
tmp_volume_sum + VOLUME;
if last.TRD_STCK_CD then do;
Volume_Sum = tmp_volume_sum;
call missing(tmp_volume_sum);
end;
drop tmp_:;
run;
答案 1 :(得分:1)
我将这更简化为只有2列的东西。代码和卷。
以下是示例表创建:
data have;
do code = 'a','b','c';
do i=1 to floor(5*ranuni(1))+1;
volume = floor(500*ranuni(1));
output;
end;
end;
drop i;
run;
首先使用PROC SQL对按代码分组的卷进行求和。将其保存在表中并在代码上放置索引。
proc sql noprint;
create table sums as
select code, sum(volume) as volume_sum
from have
group by code;
create index code on sums;
quit;
我假设您已按代码对表格进行了排序。如果没有,请这样做。
现在我们运行我们拥有的数据。将volume_sum
设置为null。如果我们在该代码的最后一条记录上,则从SUMS
表中查找值。
data want;
set have;
by code;
volume_sum = .;
if last.code then
set sums key=code;
run;
印刷我得到:
code volume volume_sum
a 485 485
b 129 .
b 460 589
c 271 .
c 265 .
c 24 .
c 33 .
c 409 1002