我需要创建一个摘要数据集/报告,跟踪这些购买的流量。我有一个数据集,它给出了整体服务的注册日期,以及9个变量,它们给出了不同的附加产品的购买日期。如果add on variable日期与注册日期匹配,那么这些add on products会包含在注册包中。注册日期之后的任何变量购买日期是在活动帐户的历史记录期间购买的产品。这就是它的样子:
data have ;
length ID 8
signup_DT 8 preferredhd_tv_estbd_dt 8
ultimate_estbd_dt 8 quant_estbd_dt 8
FullyLoaded_estbd_dt 8 HB_estbd_dt Cin_estbd_dt 8
time_estbd_dt 8 router_estbd_dt internet_estbd_dt 8;
INPUT ID 8
signup_DT : anydtdte9. preferredhd_tv_estbd_dt : anydtdte9.
ultimate_estbd_dt : anydtdte9. quant_estbd_dt : anydtdte9.
FullyLoaded_estbd_dt : anydtdte9. HB_estbd_dt Cin_estbd_dt : anydtdte9.
time_estbd_dt : anydtdte9. router_estbd_dt internet_estbd_dt : anydtdte9. ;;
format signup_DT preferredhd_tv_estbd_dt
ultimate_estbd_dt quant_estbd_dt
FullyLoaded_estbd_dt HB_estbd_dt Cin_estbd_dt
time_estbd_dt router_estbd_dt internet_estbd_dt date9.;
datalines;
98663699 4/7/14 4/9/14 4/7/14 9/12/14 10/15/14 7/7/14 4/7/14 4/7/14 4/12/14 .
33663798 4/11/14 . 4/11/14 . 4/11/14 4/11/14 4/11/14 4/11/14 6/11/14 7/15/14
43663463 5/12/14 5/12/14 5/12/14 9/5/14 9/17/14 . . . . .
77661437 5/16/14 . 5/16/14 . 10/31/14 . 5/16/14 5/16/14 11/16/14 .
85662295 5/29/14 . . 5/29/14 . 6/12/14 . . 11/16/14 .
36656756 6/4/14 . . . 6/4/14 6/4/14 6/12/14 6/4/14 6/4/14 12/4/14
67662646 6/14/14 . 6/14/14 8/31/14 . . 6/17/14 6/14/14 . 6/22/14
55663786 6/26/14 . . . 8/14/14 6/26/14 7/8/14 6/26/14 11/30/14 .
44663191 8/21/14 . 9/30/14 . . . . 1/12/15 . 10/31/14
;
我想要产生的变量是:
如果我只采取四月,我正在寻找的输出是这样的:
data want ;
length
Sign_up_Month $5
Sign_up_count 8
Initial_Products_total 8
Products $25
Prod_Purchased_on_Signup 8
AddPro_ April_After_SU 8
May 8 June 8 July 8 August 8 September 8 October 8;
INPUT Sign_up_Month $
Sign_up_count
Initial_Products_total
Products $
Prod_Purchased_on_Signup
AddPro_ April_After_SU
May June July August September October;
datalines;
April 2 8 preferredhd_tv_estbd_dt 1
April 2 8 ultimate_estbd_dt 2
April 2 8 quant_estbd_dt 1
April 2 8 FullyLoaded_estbd_dt 1 1
April 2 8 HB_estbd_dt 1
April 2 8 Cin_estbd_dt 2
April 2 8 time_estbd_dt 2
April 2 8 router_estbd_dt 1 1
April 2 8 internet_estbd_dt 1
;
下面是输出数据集中前三个变量的代码:signup_month,Sign_up_count,Initial_Products_total。
proc sort data=have;
by ID signup_DT; run;
proc transpose data=have out=have (drop=_LABEL_);
by ID signup_DT; run;
data have;
set have;
if signup_DT=COL1 then Initial_flag=1;run;
proc sql;
create table have as
select distinct
count( distinct ID) as Sign_up_count ,
month (signup_DT) as signup_month,
sum (Initial_flag) as Initial_Products
from have
group by month ( signup_DT) ; quit;
我在创建剩余的变量时遇到问题:Prod_Purchased_on_Signup,AddPro_ April_After_SU以及按月计算的数量。
我一直在尝试使用数组试图完成此操作,但我遇到了麻烦。
答案 0 :(得分:1)
我不确定你的问题是什么级别的聚合你希望你的计数。但是,如果您要查找每个不同ID和注册日期的摘要,这里有一个解决方案。这需要您按ID signup_DT
排序的原始输入。
proc transpose
data = have
out = trans;
by ID signup_DT;
run;
/* Sort for by group processing and regular name order */
proc sort data = trans;
by ID signup_DT _NAME_;
run;
data products (drop = _NAME_ COL1 i);
set trans;
/* For by group processing */
by ID signup_DT;
/* Get the signup month as a word */
signup_month = put(signup_DT, monname.);
/* Make the product list variable to prevent truncation */
length Products $400.;
/* Retain so we can add to the variables as we go down through the group */
retain Products Sign_up_count signups_month0-signups_month4;
/* Set up array reference for later month counts so we can loop */
array som[5] signups_month0-signups_month4;
/* Reset out new variables */
if first.signup_DT then do;
Products = "";
Sign_up_count = 0;
do i = 1 to 5;
som[i] = 0;
end;
end;
/* Add to the listt and count of sign up products */
if signup_DT = COL1 then do;
Sign_up_count + 1;
Products = catx(" ", Products, _NAME_);
end;
/* Otherwise add to the later month counts by checking months seperating the dates */
else do i = 1 to 5;
if intck("month", signup_DT, COL1) = i - 1 then som[i] + 1;
end;
/* Only output once we have completed a group */
if last.signup_DT and Sign_up_count then output;
run;