Base SAS:按日期变换计数

时间:2014-09-25 15:04:13

标签: arrays date sas transpose

我需要创建一个摘要数据集/报告,跟踪这些购买的流量。我有一个数据集,它给出了整体服务的注册日期,以及9个变量,它们给出了不同的附加产品的购买日期。如果add on variable日期与注册日期匹配,那么这些add on products会包含在注册包中。注册日期之后的任何变量购买日期是在活动帐户的历史记录期间购买的产品。这就是它的样子:

data have ;
 length ID 8 
  signup_DT 8 preferredhd_tv_estbd_dt 8 
  ultimate_estbd_dt 8   quant_estbd_dt 8    
  FullyLoaded_estbd_dt 8    HB_estbd_dt Cin_estbd_dt 8  
  time_estbd_dt 8   router_estbd_dt internet_estbd_dt 8; 
 INPUT ID 8 
  signup_DT  : anydtdte9.  preferredhd_tv_estbd_dt  : anydtdte9.    
  ultimate_estbd_dt  : anydtdte9.   quant_estbd_dt  : anydtdte9.    
  FullyLoaded_estbd_dt  : anydtdte9.    HB_estbd_dt Cin_estbd_dt  : anydtdte9.  
  time_estbd_dt  : anydtdte9.   router_estbd_dt internet_estbd_dt  : anydtdte9. ;;
 format   signup_DT  preferredhd_tv_estbd_dt    
  ultimate_estbd_dt     quant_estbd_dt  
  FullyLoaded_estbd_dt  HB_estbd_dt Cin_estbd_dt    
  time_estbd_dt     router_estbd_dt internet_estbd_dt   date9.; 
datalines; 
98663699    4/7/14  4/9/14  4/7/14  9/12/14 10/15/14 7/7/14 4/7/14  4/7/14  4/12/14 .
33663798    4/11/14 .   4/11/14 .   4/11/14 4/11/14 4/11/14 4/11/14 6/11/14 7/15/14
43663463    5/12/14 5/12/14 5/12/14 9/5/14  9/17/14 .   .   .   .   .
77661437    5/16/14 .   5/16/14 .   10/31/14    .   5/16/14 5/16/14 11/16/14    .
85662295    5/29/14 .   .   5/29/14 .   6/12/14 .   .   11/16/14    .
36656756    6/4/14  .   .   .   6/4/14  6/4/14  6/12/14 6/4/14  6/4/14  12/4/14
67662646    6/14/14 .   6/14/14 8/31/14 .   .   6/17/14 6/14/14 .   6/22/14
55663786    6/26/14 .   .   .   8/14/14 6/26/14 7/8/14  6/26/14 11/30/14    .
44663191    8/21/14 .   9/30/14 .   .   .   .   1/12/15 .   10/31/14
;  

我想要产生的变量是:

  1. 注册月份(简单易行)
  2. 该月注册总数(易于操作)
  3. 注册中包含的其他产品的总数
  4. 一个变量,它全部添加了产品值(从原始数据集转置)。
  5. 在启动日期购买的不同产品的数量
  6. 在注册日期的同一个月内购买的注册日期之后购买的产品的添加计数 7.然后按月计算附加产品的月份变量
  7. 如果我只采取四月,我正在寻找的输出是这样的:

     data want ;
         length 
          Sign_up_Month $5
          Sign_up_count 8
          Initial_Products_total    8
          Products  $25
          Prod_Purchased_on_Signup  8
          AddPro_ April_After_SU 8
          May 8 June 8  July 8  August 8    September 8 October 8; 
         INPUT Sign_up_Month    $   
          Sign_up_count 
          Initial_Products_total    
          Products  $
          Prod_Purchased_on_Signup  
          AddPro_ April_After_SU    
          May   June    July    August  September   October; 
        datalines; 
        April   2   8   preferredhd_tv_estbd_dt     1                       
        April   2   8   ultimate_estbd_dt           2                           
        April   2   8   quant_estbd_dt                          1   
        April   2   8   FullyLoaded_estbd_dt    1                           1
        April   2   8   HB_estbd_dt            1                            
        April   2   8   Cin_estbd_dt    2                           
        April   2   8   time_estbd_dt   2                           
        April   2   8   router_estbd_dt     1       1               
        April   2   8   internet_estbd_dt                   1           
        ;
    

    下面是输出数据集中前三个变量的代码:signup_month,Sign_up_count,Initial_Products_total。

    proc sort data=have; 
    by ID signup_DT; run; 
    proc transpose data=have out=have (drop=_LABEL_); 
    by ID signup_DT; run; 
    data have; 
    set have; 
    if signup_DT=COL1 then Initial_flag=1;run; 
    proc sql; 
    create table have as 
    select  distinct
    count( distinct ID) as Sign_up_count ,
    month (signup_DT) as signup_month, 
    sum (Initial_flag) as Initial_Products 
    from have
    group by month ( signup_DT) ; quit;
    

    我在创建剩余的变量时遇到问题:Prod_Purchased_on_Signup,AddPro_ April_After_SU以及按月计算的数量。

    我一直在尝试使用数组试图完成此操作,但我遇到了麻烦。

1 个答案:

答案 0 :(得分:1)

我不确定你的问题是什么级别的聚合你希望你的计数。但是,如果您要查找每个不同ID和注册日期的摘要,这里有一个解决方案。这需要您按ID signup_DT排序的原始输入。

proc transpose 
    data = have 
    out = trans; 
    by ID signup_DT; 
run; 

/* Sort for by group processing and regular name order */
proc sort data = trans;
    by ID signup_DT _NAME_;
run;

data products (drop = _NAME_ COL1 i);
    set trans;
    /* For by group processing */
    by ID signup_DT;
    /* Get the signup month as a word */
    signup_month = put(signup_DT, monname.);
    /* Make the product list variable to prevent truncation */
    length Products $400.;
    /* Retain so we can add to the variables as we go down through the group */
    retain Products Sign_up_count signups_month0-signups_month4;
    /* Set up array reference for later month counts so we can loop */
    array som[5] signups_month0-signups_month4;
    /* Reset out new variables */
    if first.signup_DT then do;
        Products = "";
        Sign_up_count = 0;
        do i = 1 to 5;
            som[i] = 0;
        end;
    end;
    /* Add to the listt and count of sign up products */
    if signup_DT = COL1 then do;
        Sign_up_count + 1;
        Products = catx(" ", Products, _NAME_);
    end;
    /* Otherwise add to the later month counts by checking months seperating the dates */
    else do i = 1 to 5;
        if intck("month", signup_DT, COL1) = i - 1 then som[i] + 1;
    end;
    /* Only output once we have completed a group */
    if last.signup_DT and Sign_up_count  then output;
run;