计算SAS中的列累积总和和百分比

时间:2018-08-01 15:33:02

标签: sql sas

在创建查询作为SAS proc SQL时,我需要一些帮助。

请考虑以下数据集,该数据集已将来自不同地区的销售额按3小时块进行了存储(仅是一个子集,实际数据涵盖了24小时):

 Date        ObsAtHour Region   Sales
 1/1/2018    2         Asia     76 
 1/1/2018    2         Africa   5 
 1/1/2018    5         Asia     14
 1/1/2018    5         Africa   10
 2/1/2018    2         Asia     40
 2/1/2018    2         Africa   1 
 2/1/2018    5         Asia     15
 2/1/2018    5         Africa   20

我获得了过去45天的数据。

我想做两件事

1)按日期,ObsAtHour和Region分组,并获得销售累计金额,这样我就得到类似的信息

 Date        ObsAtHour Region   Sales CumSales
 1/1/2018    2         Asia     76    76
 1/1/2018    2         Africa   5     5
 1/1/2018    5         Asia     14    90
 1/1/2018    5         Africa   10    15
 2/1/2018    2         Asia     40    40
 2/1/2018    2         Africa   1     1
 2/1/2018    5         Asia     15    55
 2/1/2018    5         Africa   20    21

2)获取销售百分比,该百分比指示在任何obsAtHour上每个区域的每日销售百分比。看起来像:

 Date        ObsAtHour Region   Sales CumSales  Pct
 1/1/2018    2         Asia     76    76        84%
 1/1/2018    2         Africa   5     5         33%
 1/1/2018    5         Asia     14    90        100%
 1/1/2018    5         Africa   10    15        100%
 2/1/2018    2         Asia     40    40        72% 
 2/1/2018    2         Africa   1     1         4.76%
 2/1/2018    5         Asia     15    55        100%
 2/1/2018    5         Africa   20    21        100% 

非常感谢您的帮助。

2 个答案:

答案 0 :(得分:1)

类似下面的

data have;
input Date:mmddyy10.        ObsAtHour Region $  Sales;
format date mmddyy10;
datalines;
1/1/2018    2         Asia     76 
1/1/2018    2         Africa   5 
1/1/2018    5         Asia     14
1/1/2018    5         Africa   10
2/1/2018    2         Asia     40
2/1/2018    2         Africa   1 
2/1/2018    5         Asia     15
2/1/2018    5         Africa   20
 ;
 proc sort data=have;
 by date region;
 run;

/* this gives moving sum*/
 data have1;
format date mmddyy10.;
set have;
by   date region;
 if first.region then sumsales = sales;
  else sumsales+sales;
  run;

/* get the total sales from your intial table by group and join it back 
and calculate the percent*/
proc sql;
select a.*, sumsales/tot_sales  as per format =percent10.2 from 
(select * from have1)a
inner join
(select region , date, sum(sales) as tot_sales
from have
group by 1, 2)b
on a.region =b.region
 and a.date =b.date;

答案 1 :(得分:1)

理解以下查询的关键是累积级别将称为层。层用作自联接条件的一部分,以限制为求和而分组的项。

数据

data have;
input Date ddmmyy10. ObsAtHour Region $  Sales;
format Date yymmdd10.;
datalines;
 1/1/2018    2         Asia     76 
 1/1/2018    2         Africa   5 
 1/1/2018    5         Asia     14
 1/1/2018    5         Africa   10
 2/1/2018    2         Asia     40
 2/1/2018    2         Africa   1 
 2/1/2018    5         Asia     15
 2/1/2018    5         Africa   20
run;

示例查询

第二个查询(百分比计算)是根据第一个查询的结果(累积计算)执行的,但是,第一个查询可以作为嵌套查询嵌入到第二个查询中。

proc sql;
  create table want(label='Cumulative within day up to obsathour') as
  select 
    tiers.Date
  , tiers.ObsAtHour
  , tiers.Region
  , Sum(case when have.ObsAtHour = tiers.ObsAtHour then have.Sales else 0 end) as SalesAtTier
  , Sum(have.Sales) as CumSales
  , Count(*) as CumCount
  from
    have
  join
    (select distinct Date, ObsAtHour, Region from have) as tiers
  on
    have.Date = tiers.Date
    and have.Region = tiers.Region
    and have.ObsAtHour <= tiers.ObsAtHour
  group by
    tiers.Date, tiers.Region, tiers.ObsAtHour
  order 
    by Date, ObsAtHour, Region
  ;

  create table want2 as
  select
    cum.Date
  , cum.ObsAtHour
  , cum.Region
  , cum.SalesAtTier
  , cum.CumSales
  , cum.CumSales / Sum(cum.SalesAtTier) as fraction format=Percent7.2
  from
    want as cum
  group by
    cum.Date, cum.Region
  order by 
    cum.Date, cum.ObsAtHour, cum.Region
  ;