我有以下数据集:
需要添加两个新列 - 对于每个客户,第一个从第2行中减去第1行,这样我们就可以获得“天数”,之后客户续订其会员资格 - 第二个计算客户续订其成员资格的次数,这将是从0开始的计数。
Row - Customer - Renew Date - Type of Renewal - Days_Since -Prev_Renewal
1 - A - June 10, 2010 - X
2 - A - May 01, 2011 - Y
3 - B - Jan 05, 2010 - Y
4 - B - Dec 10, 2010 - Z
5 - B - Dec 10, 2011 - X
这是我现在正在使用的代码。有没有办法将这两组查询合并为一个?
data have;
informat renew_date ANYDTDTE.;
format renew_date DATE9.;
infile datalines dlm='-';
input Row Customer $ Renew_Date Renewal_Type $;
datalines;
1 - A - June 10, 2010 - X
2 - A - May 01, 2011 - Y
3 - B - Jan 05, 2010 - Y
4 - B - Dec 10, 2010 - Z
5 - B - Dec 10, 2011 - X
;;;;
run;
data want;
set have;
by customer;
retain prev_days; *retain the value of prev_days from one row to the next;
if first.customer
then
days_since=0;
*initialize days_since to zero for each customer's first record;
else days_since=renew_date-prev_days; *otherwise set it to the difference;
output; *output the current record;
prev_days=renew_date;
*now change prev_days to the renewal date so the next record has it;
run;
data want1;
set have;
by customer;
retain prev_renewal;
if first.customer then prev_renewal=0;
else prev_renewal=prev_renewal+1;
output;
run;
由于
答案 0 :(得分:0)
这不是SQL而是数据步骤/基本SAS代码 - SAS Institute的专有(4GL)语言,它实际上是在SQL之前。
关于您的程序 - 值得指出的是,在使用BY语句之前,必须对数据进行排序或索引(在本例中为客户)。在这种情况下,您的数据线的顺序正确。
以下是您需要的组合代码:
data want (drop=prev_days);
set have;
by customer;
retain prev_days;
if first.customer then do;
days_since=0;
prev_renewal=0;
end;
else do;
days_since=renew_date-prev_days;
prev_renewal+1; /* IMPLIED retain - special syntax */
end;
output;
prev_days=renew_date;
run;