我有一些消费者数据告诉消费者在哪里存储数据有更多行具有不同的消费者ID,但出于测试目的,我已经包含了大部分数据
ConsumerID Retailer Product_Code Shopping_Date
1 Wallmart 12345 20090721
1 Wallmart 12345 20090722
1 Bestbuy 23456 20090801
1 Bestbuy 23456 20090801
1 Bestbuy 23456 20090801
1 Bestbuy 23456 20090801
1 Frys 23444 20090908
基本上如果零售商不相同且产品代码不相同,则消费者已将零售商和产品转换为,例如第一行和第二行具有相同的零售商,因此他们不会计入新数据集。 我要创建的新数据集应该包含fromretailer toretailer,如果Shopping_date相同然后被计为一次,**但是当我们显示最终结果时它仍会显示4个transcations所以这个人从wallmart到bestbuy 4次。然后,当人们从Bestbuy切换到4次再次煎炸时,因为besbuy有4次交易。
所以我的新数据应该是这样的
ConsumerID FromRetailer ToRetailer FromDate ToDate FP
1 Wallmart BestBuy 20090722 20090801 1/4
1 Wallmart BestBuy 20090722 20090801 1/4
1 Wallmart BestBuy 20090722 20090801 1/4
1 Wallmart BestBuy 20090722 20090801 1/4
1 Bestbuy Frys 20090722 20090908 1/4
1 Bestbuy Frys 20090722 20090908 1/4
1 Bestbuy Frys 20090722 20090908 1/4
1 Bestbuy Frys 20090722 20090908 1/4
FP基本上是这个人切换的次数,他们从一次切换到4次,这可以将2次改为4次,这将使FP 1/8
我的主要问题是如何首先将第一行与第二行进行比较,然后下一个问题是,例如,如果第二行具有相同的日期,则将其归类为一次,以便消费者从wall mart转到Best买4次。
更多解释
1 Wallmart 12345 20090721
1 Wallmart 23456 20090722
1 Wallmart 23456 20090821
输出应为
Consumer_ID From_Store To_Store From_Porduct To_Product FP
1 Wallmart Wallmart 12345 23456 1
说明: 有两种类型的交换机产品交换机和存储交换机,因此交换机的条件应该是零售商!= retailer2或productCode!= productCode2然后是交换机(产品或商店)。
FP简单地通过开关计数来计算,例如在上面的示例中,消费者从wallmart到wallmart是一对一的开关,因此FP是一个,但在前面的例子中,这个人从wallmart到besbuy是1/4,因为他们最好买了4件商品。
Shopping_date非常重要,因为例如,一个人买了4件物品就变成了购物之旅但是当我们显示数据时我们仍然显示4个交易我之所以说它成为一次旅行是因为我们不把bestbuy和bestbuy比作它们出现在同一个shopping_date上。
因此,为了总结数据包含多个consumer_id的所有内容,每个consumer_id需要进行相应的比较,以便我们对Consumer_ID进行分组
然后我们检查它是否是商店开关或产品开关如果我们有商店/产品开关我们比较行如果行具有购买日期它们被归类为一次旅行但我们仍然在最终输出中显示4个交易。 样本数据
1 Wallmart 12345 20090721
1 Wallmart 23456 20090722
1 Wallmart 23456 20090724
1 Bestbuy 23456 20090801
1 Bestbuy 23456 20090801
1 Bestbuy 23456 20090801
1 Bestbuy 23456 20090801
1 Frys 3456 20090903
2 Frys 12455 20090905
2 Frys 3456 20090904
2 Frys 3456 20090904
输出数据
Consumer_ID From_Store To_Store From_Product To_Product From_Date To_Date FP Type of Switch
1 Wallmart Wallmart 12345 23456 20090721 20090724 1 Product_Switch
1 Wallart Bestbuy 23456 23456 20090724 20090801 1/4 Store Switch
1 Wallart Bestbuy 23456 23456 20090724 20090801 1/4 Store Switch
1 Wallart Bestbuy 23456 23456 20090724 20090801 1/4 Store Switch
1 Wallart Bestbuy 23456 23456 20090724 20090801 1/4 Store Switch
1 Bestbuy Frys 23456 3456 20090801 20090903 1/4 Store Switch
1 Bestbuy Frys 23456 3456 20090801 20090903 1/4 Store Switch
1 Bestbuy Frys 23456 3456 20090801 20090903 1/4 Store Switch
1 Bestbuy Frys 23456 3456 20090801 20090903 1/4 Store Switch
2 Frys Frys 12455 3456 20090905 20090904 1 Store_Switch
注意:每个消费者的待遇都不同我们不会将消费者的交易与消费者的消费者进行比较。我希望这有助于我们不需要类型的开关我把它放在那里理解
我写的一些代码
data work.switches;
set work.consumerData;
from_retailer=lag(retialer);
to_retailer=retialer;
from_product_code=lag(product_code);
to_product_code=product_code;
if from_retailer ne to_retailer or from_product ne to_product then
do i=1 by 1 until (last.trip_date);
/*Not sure what to do here
end;
run;
答案 0 :(得分:0)
所以这可能不是最好或最简单的解决方案,但我想不出另一种方法来解决这个问题。另外,我认为我没有正确估算你的FP变量......但也许这可以帮助你得到你需要的东西。
*Import data;
data consumerData;
input Consumer_ID Store $ Product Date;
datalines;
1 Wallmart 12345 20090721
1 Wallmart 12345 20090722
1 Bestbuy 23456 20090801
1 Bestbuy 23456 20090801
1 Bestbuy 23456 20090801
1 Bestbuy 23456 20090801
1 Frys 23444 20090908
;
run;
*Make sure data sorted by consumer_id and date;
proc sort data=consumerData; BY Consumer_ID Date Store;
*total number of purchases made in a day. Get from dataset;
proc freq data=consumerData noprint;
TABLE Consumer_ID*Store*Date/out=freq;
run;
proc sql noprint;
select max(count) as totalPurch from freq;
quit;
data work.switches (Drop=Store Product Date product_: lagProd_: i j keep lagNObs NObs);
length Type_of_Switch $30;
set work.consumerData;
BY Consumer_ID Date Store;
*count number of records per shopping trip date (from same store);
NObs+1;
if first.Date and first.Store then NObs=1;
*create array to hold different purchases from store on same date;
retain product_1-product_&totalPurch;
array arProduct[*] product_1-product_&totalPurch;
do i=1 to &totalPurch;
if NObs=i then arProduct[i]=Product;
if NObs=1 then do j=2 to &totalPurch;
arProduct[j]=.; *reset;
end;
end;
*only keep one record for each shopping trip;
if last.Date then keep=1;
if keep^=1 then delete;
*Compute From and To Store and Date Variables;
From_Store=lag(Store);
To_Store=Store;
From_Date=lag(Date);
To_Date=Date;
*Get lagged Product Values;
lagNObs=lag(NObs);*identify number of lagged purchases;
array lagProd(*)lagProd_1-lagProd_&totalPurch;
do i=1 to &totalPurch;
lagProd[i]=lag(arProduct[i]);
end;
*Compute From and To Product Variables;
do i=1 to NObs;
if lagNObs^=. then do j=1 to lagNObs;
From_Product=lagProd[j];
To_Product=arProduct[i];
FP=strip(lagNObs)||'/'||strip(Nobs);
*Label Types of Switches;
if From_Store ne To_Store then Type_of_Switch='Store Switch';
if From_Product ne To_Product then Type_of_Switch='Product Switch';
if (From_Product ne To_Product) and (From_Store ne To_Store) then Type_of_Switch='Product and Store Switch';
if j>1 then output switches;
end;
if i>1 then output;
end;
*Delete first occurance so eliminate obs without lag;
if first.Consumer_ID then delete;
*Output switches;
if (From_Store ne To_Store) or (From_Product ne To_Product) then output;
run;