计算SAS

时间:2018-05-22 10:43:17

标签: sas proc-sql

我有一个名为商店的数据集。我想提取 total_sales(retail_price), 每个商店的销售比例累计销售比例   SAS。

样本数据集: - 商店

Date       Store_Postcode   Retail_Price    month   Distance
08/31/2013  CR7 8LE              470           8    7057.8
10/26/2013  CR7 8LE              640           10   7057.8
08/19/2013  CR7 8LE              500           8    7057.8
08/17/2013  E2 0RY               365           8    1702.2
09/22/2013  W4 3PH               395.5         12   2522
06/19/2013  W4 3PH               360.5         6    1280.9
11/15/2013  W10 6HQ              475           12   3213.5
06/20/2013  W10 6HQ              500           1    3213.5
09/18/2013  E7 8NW               315           9    2154.8
10/23/2013  E7 8NW               570           10   5777.9
11/18/2013  W10 6HQ              455           11   3213.5
08/21/2013  W10 6HQ              530           8    3213.5

我试过的代码: -

Proc sql;                                                                                                                               

Create table work.Top_sellers as                                                                                                         

Select Store_postcode as Stores,SUM(Retail_price) as Total_Sales,Round((Retail_price/Sum(Retail_price)),0.01) as               
Proportion_of_sales                                                                                                                     

From work.stores                                                                                                           

Group by Store_postcode                                                                                                                 
Order by total_sales;                                                                                                                   

Quit;  

我不知道如何计算proc sql中的累积变量... 请帮我改进我的代码!!

2 个答案:

答案 0 :(得分:1)

在SQL中计算累积结果要求数据具有显式唯一有序键,并且查询涉及与“三角形”的自反连接。累积方面的标准。

data have;
  do id = 100 to 120;
    sales = ceil (10 + 25 * ranuni(123));
    output;
  end;
run;

proc sql;
  create table want as
  select 
    have1.id
  , have1.sales
  , sum(have2.sales) as sales_cusum
  from
    have as have1
  join
    have as have2
  on 
    have1.id >= have2.id  /* 'triangle' criteria */
  group by
    have1.id, have1.sales
  order by
    have1.id
  ;
quit;

第二种方法是逐行重新计算cusum

proc sql;
  create table want as
  select have.id, have.sales, 
   ( select sum(inner.sales) 
     from (select * from have) as inner
     where inner.id <= have.id
   )
   as cusum
  from
   have;

答案 1 :(得分:1)

我改变主意,CDF是一个不同的计算方法。 以下是通过数据步骤执行此操作的方法。首先计算累计总数(我在这里使用了数据步骤,但如果你有SAS / ETS,我可以使用PROC EXPAND)。

*sort demo data;
proc sort data=sashelp.shoes out=shoes;
by region sales;
run;

data cTotal last (keep = region cTotal);
set shoes;
by region;

*calculate running total;
if first.region then cTotal=0;
cTotal = cTotal + sales;

*output records, everything to cTotal but only the last record which is total to Last dataset;
if last.region then output last;
output cTotal;

retain cTotal;
run;

*merge in results and calculate percentages;
data calcs;
merge cTotal Last (rename=cTotal=Total);
by region;

percent = cTotal/Total;
run;

如果您需要更高效的解决方案,我会尝试使用DoW解决方案。