我拥有一套独特的客户ID和购买,需要将每个客户的每个独特购买内容浓缩为一个观察点。
如,
CustID Purchase1 Purchase2 Purchase3 Purchase4
J Bike Shoes Shirt Pants
J Shirt Pants null null
J Bike Helmet Pants null
K Shoes Helmet null null
L Basketball Shoes Shirt null
L Bike Helmet null null
我希望我的输出看起来像:
CustID P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 P11 P12 PN
J Bike Shoes Shirt Pants Helmet null null null null null null null
K Shoes Helmet null null ........ null
L Basketball Shoes Shirt Bike Helmet null .... null
我可以为最大P设置一个非常大的值,这样我就不会打它,但是如果有人可以告诉我如何扫描数据集并设置P对应的P的最大值,则可以获得奖励积分针对特定客户的最大数量的独特购买。
答案 0 :(得分:0)
这样的事情怎么样? 在同一列上的所有购买,nodupkey用于删除按主题重复购买,返回基于行的环境(系统将自动选择列命名为COL1 COL2等的列数。)
/*sample dataset*/
data want;
infile datalines delimiter=' ';
input CustID $ Purchase1 $ Purchase2 $ Purchase3 $ Purchase4 $;
datalines;
J Bike Shoes Shirt Pants
J Shirt Pants null null
J Bike Helmet Pants null
K Shoes Helmet null null
L Basketball Shoes Shirt null
L Bike Helmet null null
;
/*every purchase on the same column*/
data want01;
length purchase $200;
set want;
array purc[*] purchase:;
do i=1 to dim(purc);
PURCHALL=purc[i];
output;
end;
keep custid purchall;
run;
/*delete repeated purchases and blanks*/
proc sort data=want01 out=want02 nodupkey; where purchall not in ('' 'null'); by custid purchall; run;
/*returning on a row based dataset*/
proc transpose data=want02 out=want03;
by custid;
var purchall;
run;
如果您只想获得最大数量的唯一购买,只需在WANT02数据集上应用proc freq(包含唯一购买的数据集,不包含空格和空值)。
proc freq data=want02 noprint;
table custid /out=want04;
run;
WANT04将:
CUSTID | FREQUENCY |
--------------------
J | 5 |
K | 2 |
L | 5 |