在创建查询作为SAS proc SQL时,我需要一些帮助。
请考虑以下数据集,该数据集已将来自不同地区的销售额按3小时块进行了存储(仅是一个子集,实际数据涵盖了24小时):
Date ObsAtHour Region Sales
1/1/2018 2 Asia 76
1/1/2018 2 Africa 5
1/1/2018 5 Asia 14
1/1/2018 5 Africa 10
2/1/2018 2 Asia 40
2/1/2018 2 Africa 1
2/1/2018 5 Asia 15
2/1/2018 5 Africa 20
我获得了过去45天的数据。
我想做两件事
1)按日期,ObsAtHour和Region分组,并获得销售累计金额,这样我就得到类似的信息
Date ObsAtHour Region Sales CumSales
1/1/2018 2 Asia 76 76
1/1/2018 2 Africa 5 5
1/1/2018 5 Asia 14 90
1/1/2018 5 Africa 10 15
2/1/2018 2 Asia 40 40
2/1/2018 2 Africa 1 1
2/1/2018 5 Asia 15 55
2/1/2018 5 Africa 20 21
2)获取销售百分比,该百分比指示在任何obsAtHour上每个区域的每日销售百分比。看起来像:
Date ObsAtHour Region Sales CumSales Pct
1/1/2018 2 Asia 76 76 84%
1/1/2018 2 Africa 5 5 33%
1/1/2018 5 Asia 14 90 100%
1/1/2018 5 Africa 10 15 100%
2/1/2018 2 Asia 40 40 72%
2/1/2018 2 Africa 1 1 4.76%
2/1/2018 5 Asia 15 55 100%
2/1/2018 5 Africa 20 21 100%
非常感谢您的帮助。
答案 0 :(得分:1)
类似下面的
data have;
input Date:mmddyy10. ObsAtHour Region $ Sales;
format date mmddyy10;
datalines;
1/1/2018 2 Asia 76
1/1/2018 2 Africa 5
1/1/2018 5 Asia 14
1/1/2018 5 Africa 10
2/1/2018 2 Asia 40
2/1/2018 2 Africa 1
2/1/2018 5 Asia 15
2/1/2018 5 Africa 20
;
proc sort data=have;
by date region;
run;
/* this gives moving sum*/
data have1;
format date mmddyy10.;
set have;
by date region;
if first.region then sumsales = sales;
else sumsales+sales;
run;
/* get the total sales from your intial table by group and join it back
and calculate the percent*/
proc sql;
select a.*, sumsales/tot_sales as per format =percent10.2 from
(select * from have1)a
inner join
(select region , date, sum(sales) as tot_sales
from have
group by 1, 2)b
on a.region =b.region
and a.date =b.date;
答案 1 :(得分:1)
理解以下查询的关键是累积级别将称为层。层用作自联接条件的一部分,以限制为求和而分组的项。
数据
data have;
input Date ddmmyy10. ObsAtHour Region $ Sales;
format Date yymmdd10.;
datalines;
1/1/2018 2 Asia 76
1/1/2018 2 Africa 5
1/1/2018 5 Asia 14
1/1/2018 5 Africa 10
2/1/2018 2 Asia 40
2/1/2018 2 Africa 1
2/1/2018 5 Asia 15
2/1/2018 5 Africa 20
run;
示例查询
第二个查询(百分比计算)是根据第一个查询的结果(累积计算)执行的,但是,第一个查询可以作为嵌套查询嵌入到第二个查询中。
proc sql;
create table want(label='Cumulative within day up to obsathour') as
select
tiers.Date
, tiers.ObsAtHour
, tiers.Region
, Sum(case when have.ObsAtHour = tiers.ObsAtHour then have.Sales else 0 end) as SalesAtTier
, Sum(have.Sales) as CumSales
, Count(*) as CumCount
from
have
join
(select distinct Date, ObsAtHour, Region from have) as tiers
on
have.Date = tiers.Date
and have.Region = tiers.Region
and have.ObsAtHour <= tiers.ObsAtHour
group by
tiers.Date, tiers.Region, tiers.ObsAtHour
order
by Date, ObsAtHour, Region
;
create table want2 as
select
cum.Date
, cum.ObsAtHour
, cum.Region
, cum.SalesAtTier
, cum.CumSales
, cum.CumSales / Sum(cum.SalesAtTier) as fraction format=Percent7.2
from
want as cum
group by
cum.Date, cum.Region
order by
cum.Date, cum.ObsAtHour, cum.Region
;