我有以下数据集:
Date Occupation Count
Jan2006 Nurse 15
Jan2006 Lawyer 2
Jan2006 Mechanic 3
Feb2006 Economist 2
Feb2006 Lawyer 1
Feb2006 Nurse 5
数据一直持续到2014年12月,每个职业都有不同的职业和计数。我想要做的是按职业总计计算一年的总数。因此,假设上述数据包含所有月份和计数,我希望我的最终数据集如下所示:
Date Occupation Sum
2006 Nurse 20
2006 Lawyer 3
2006 Mechanic 3
2006 Economist 2
and so on until Dec2014.
我尝试使用first.variable和last.variable如下,但它没有用。
data want,
set have;
if first.date and first.Occupation then sum = 0;
sum+Count;
if last.date and last.occupation then output;
run;
但这并没有给我所需的输出。我觉得这可以在SQL中轻松完成,但不熟悉SQL,我对使用它犹豫不决。
提前感谢您的帮助。
答案 0 :(得分:1)
试试这个:
proc sql;
create table want as
select year(date) as date, occupation,sum(count) as sum from have
group by year(date),occupation;
quit;
答案 1 :(得分:1)
由于您使用的是SAS,因此可以利用proc summary
等过程按变量的格式化值进行分组。因此,如果您将year.
格式应用于Date变量,那么它将自动按年分组。
data have;
input Date :monyy7. Occupation $20. Count;
format date monyy7.;
datalines;
Jan2006 Nurse 15
Jan2006 Lawyer 2
Jan2006 Mechanic 3
Feb2006 Economist 2
Feb2006 Lawyer 1
Feb2006 Nurse 5
;
run;
proc summary data=have nway;
class date occupation / order=freq; /* sort by descending sum */
format date year.; /* apply year format to date for grouping purposes */
var count;
output out=want (drop=_:) sum=;
run;
答案 2 :(得分:0)
在纯粹的datasteps和proc步骤方法中,您可以像下面这样做,
data test;
infile datalines;
input MonYr monyy7. Occupation $ Count;
datalines;
Jan2006 Nurse 15
Jan2006 Lawyer 2
Jan2006 Mechanic 3
Feb2006 Economist 2
Feb2006 Lawyer 1
Feb2006 Nurse 5
;
run;
proc sort data=test;
by Occupation MonYr Count;
run;
data result(drop=MonYr Count);
set test;
by Occupation MonYr Count;
retain Sum 0;
if first.Occupation then Sum=Count;
else Sum=Sum+Count;
if last.Occupation;
Date=Year(MonYr);
run;
您可以先将YearMonth值更改为Year并执行排序,或者只关注上面的代码。
答案 3 :(得分:0)
select substring([date],charindex('2',[date]),len([date])),Occupation,sum([count])
from sas group by substring([date],charindex('2',[date]),len([date])),Occupation