我有一个名为商店的数据集。我想提取 total_sales(retail_price), 每个商店的销售比例和累计销售比例 SAS。
样本数据集: - 商店
Date Store_Postcode Retail_Price month Distance
08/31/2013 CR7 8LE 470 8 7057.8
10/26/2013 CR7 8LE 640 10 7057.8
08/19/2013 CR7 8LE 500 8 7057.8
08/17/2013 E2 0RY 365 8 1702.2
09/22/2013 W4 3PH 395.5 12 2522
06/19/2013 W4 3PH 360.5 6 1280.9
11/15/2013 W10 6HQ 475 12 3213.5
06/20/2013 W10 6HQ 500 1 3213.5
09/18/2013 E7 8NW 315 9 2154.8
10/23/2013 E7 8NW 570 10 5777.9
11/18/2013 W10 6HQ 455 11 3213.5
08/21/2013 W10 6HQ 530 8 3213.5
我试过的代码: -
Proc sql;
Create table work.Top_sellers as
Select Store_postcode as Stores,SUM(Retail_price) as Total_Sales,Round((Retail_price/Sum(Retail_price)),0.01) as
Proportion_of_sales
From work.stores
Group by Store_postcode
Order by total_sales;
Quit;
我不知道如何计算proc sql中的累积变量... 请帮我改进我的代码!!
答案 0 :(得分:1)
在SQL中计算累积结果要求数据具有显式唯一有序键,并且查询涉及与“三角形”的自反连接。累积方面的标准。
data have;
do id = 100 to 120;
sales = ceil (10 + 25 * ranuni(123));
output;
end;
run;
proc sql;
create table want as
select
have1.id
, have1.sales
, sum(have2.sales) as sales_cusum
from
have as have1
join
have as have2
on
have1.id >= have2.id /* 'triangle' criteria */
group by
have1.id, have1.sales
order by
have1.id
;
quit;
第二种方法是逐行重新计算cusum
proc sql;
create table want as
select have.id, have.sales,
( select sum(inner.sales)
from (select * from have) as inner
where inner.id <= have.id
)
as cusum
from
have;
答案 1 :(得分:1)
我改变主意,CDF是一个不同的计算方法。 以下是通过数据步骤执行此操作的方法。首先计算累计总数(我在这里使用了数据步骤,但如果你有SAS / ETS,我可以使用PROC EXPAND)。
*sort demo data;
proc sort data=sashelp.shoes out=shoes;
by region sales;
run;
data cTotal last (keep = region cTotal);
set shoes;
by region;
*calculate running total;
if first.region then cTotal=0;
cTotal = cTotal + sales;
*output records, everything to cTotal but only the last record which is total to Last dataset;
if last.region then output last;
output cTotal;
retain cTotal;
run;
*merge in results and calculate percentages;
data calcs;
merge cTotal Last (rename=cTotal=Total);
by region;
percent = cTotal/Total;
run;
如果您需要更高效的解决方案,我会尝试使用DoW解决方案。