如何逐行比较一些额外的条件?

时间:2014-02-07 12:03:23

标签: sas

我有一些消费者数据告诉消费者在哪里存储数据有更多行具有不同的消费者ID,但出于测试目的,我已经包含了大部分数据

ConsumerID  Retailer    Product_Code     Shopping_Date    
    1       Wallmart    12345            20090721
    1       Wallmart    12345            20090722
    1       Bestbuy     23456            20090801
    1       Bestbuy         23456            20090801  
    1       Bestbuy         23456            20090801
    1       Bestbuy         23456            20090801
    1       Frys            23444            20090908

基本上如果零售商不相同且产品代码不相同,则消费者已将零售商和产品转换为,例如第一行和第二行具有相同的零售商,因此他们不会计入新数据集我要创建的新数据集应该包含fromretailer toretailer,如果Shopping_date相同然后被计为一次,**但是当我们显示最终结果时它仍会显示4个transcations所以这个人从wallmart到bestbuy 4次。然后,当人们从Bestbuy切换到4次再次煎炸时,因为besbuy有4次交易。

所以我的新数据应该是这样的

ConsumerID  FromRetailer ToRetailer    FromDate        ToDate      FP
        1   Wallmart     BestBuy        20090722       20090801    1/4
        1   Wallmart     BestBuy        20090722       20090801    1/4
        1   Wallmart     BestBuy        20090722       20090801    1/4
        1   Wallmart     BestBuy        20090722       20090801    1/4
        1   Bestbuy      Frys           20090722       20090908    1/4
        1   Bestbuy      Frys           20090722       20090908    1/4
        1   Bestbuy      Frys           20090722       20090908    1/4
        1   Bestbuy      Frys           20090722       20090908    1/4
FP基本上是这个人切换的次数,他们从一次切换到4次,这可以将2次改为4次,这将使FP 1/8

我的主要问题是如何首先将第一行与第二行进行比较,然后下一个问题是,例如,如果第二行具有相同的日期,则将其归类为一次,以便消费者从wall mart转到Best买4次。

更多解释

1       Wallmart    12345            20090721
1       Wallmart    23456            20090722
1       Wallmart    23456            20090821

输出应为

   Consumer_ID  From_Store   To_Store          From_Porduct    To_Product  FP
    1             Wallmart     Wallmart         12345           23456        1

说明: 有两种类型的交换机产品交换机和存储交换机,因此交换机的条件应该是零售商!= retailer2或productCode!= productCode2然后是交换机(产品或商店)。

FP简单地通过开关计数来计算,例如在上面的示例中,消费者从wallmart到wallmart是一对一的开关,因此FP是一个,但在前面的例子中,这个人从wallmart到besbuy是1/4,因为他们最好买了4件商品。

Shopping_date非常重要,因为例如,一个人买了4件物品就变成了购物之旅但是当我们显示数据时我们仍然显示4个交易我之所以说它成为一次旅行是因为我们不把bestbuy和bestbuy比作它们出现在同一个shopping_date上。

因此,为了总结数据包含多个consumer_id的所有内容,每个consumer_id需要进行相应的比较,以便我们对Consumer_ID进行分组

然后我们检查它是否是商店开关或产品开关如果我们有商店/产品开关我们比较行如果行具有购买日期它们被归类为一次旅行但我们仍然在最终输出中显示4个交易。 样本数据

 1       Wallmart    12345            20090721
 1       Wallmart    23456            20090722
 1       Wallmart    23456            20090724
 1       Bestbuy     23456            20090801
 1       Bestbuy     23456            20090801
 1       Bestbuy     23456            20090801
 1       Bestbuy     23456            20090801
 1       Frys        3456             20090903
 2       Frys        12455            20090905
 2       Frys        3456             20090904
 2       Frys        3456             20090904

输出数据

Consumer_ID      From_Store     To_Store    From_Product   To_Product       From_Date        To_Date             FP   Type of Switch
1                Wallmart         Wallmart    12345        23456            20090721         20090724            1       Product_Switch
1                Wallart          Bestbuy     23456        23456            20090724         20090801            1/4     Store Switch
1                Wallart          Bestbuy     23456        23456            20090724         20090801            1/4     Store Switch
1                Wallart          Bestbuy     23456        23456            20090724         20090801            1/4     Store Switch
1                Wallart          Bestbuy     23456        23456            20090724         20090801            1/4     Store Switch
1                Bestbuy          Frys        23456        3456             20090801         20090903            1/4     Store Switch
1                Bestbuy          Frys        23456        3456             20090801         20090903            1/4     Store Switch
1                Bestbuy          Frys        23456        3456             20090801         20090903            1/4     Store Switch
1                Bestbuy          Frys        23456        3456             20090801         20090903            1/4     Store Switch
2                Frys             Frys        12455        3456             20090905         20090904            1       Store_Switch

注意:每个消费者的待遇都不同我们不会将消费者的交易与消费者的消费者进行比较。我希望这有助于我们不需要类型的开关我把它放在那里理解

我写的一些代码

data work.switches;
 set work.consumerData;


 from_retailer=lag(retialer);
 to_retailer=retialer;
 from_product_code=lag(product_code);
 to_product_code=product_code;
 if from_retailer ne to_retailer or from_product ne to_product then 
    do i=1 by 1 until (last.trip_date);



      /*Not sure what to do here
       end;
run;

1 个答案:

答案 0 :(得分:0)

所以这可能不是最好或最简单的解决方案,但我想不出另一种方法来解决这个问题。另外,我认为我没有正确估算你的FP变量......但也许这可以帮助你得到你需要的东西。

*Import data;
data consumerData;
    input Consumer_ID Store $ Product Date;
    datalines;
    1 Wallmart 12345 20090721
    1 Wallmart 12345 20090722
    1 Bestbuy 23456 20090801
    1 Bestbuy 23456 20090801
    1 Bestbuy 23456 20090801
    1 Bestbuy 23456 20090801
    1 Frys 23444 20090908
;
run;
*Make sure data sorted by consumer_id and date;
proc sort data=consumerData; BY Consumer_ID Date Store;

*total number of purchases made in a day. Get from dataset;
proc freq data=consumerData noprint;
    TABLE Consumer_ID*Store*Date/out=freq;
run;
proc sql noprint;
    select max(count) as totalPurch from freq;
quit;

data work.switches (Drop=Store Product Date product_: lagProd_: i j keep lagNObs NObs);
    length Type_of_Switch $30;
    set work.consumerData;
    BY Consumer_ID Date Store;

    *count number of records per shopping trip date (from same store);
    NObs+1;
    if first.Date and first.Store then NObs=1;

    *create array to hold different purchases from store on same date;
    retain product_1-product_&totalPurch;
    array arProduct[*] product_1-product_&totalPurch;
    do i=1 to &totalPurch;
        if NObs=i then arProduct[i]=Product;
        if NObs=1 then do j=2 to &totalPurch;
            arProduct[j]=.; *reset;
        end;
    end;

    *only keep one record for each shopping trip;
    if last.Date then keep=1;
    if keep^=1 then delete;

    *Compute From and To Store and Date Variables;
    From_Store=lag(Store);
    To_Store=Store;
    From_Date=lag(Date);
    To_Date=Date;

    *Get lagged Product Values;
    lagNObs=lag(NObs);*identify number of lagged purchases;
    array lagProd(*)lagProd_1-lagProd_&totalPurch;
    do i=1 to &totalPurch;
        lagProd[i]=lag(arProduct[i]);
    end;

    *Compute From and To Product Variables;
    do i=1 to NObs;
        if lagNObs^=. then do j=1 to lagNObs;
            From_Product=lagProd[j];
            To_Product=arProduct[i];
            FP=strip(lagNObs)||'/'||strip(Nobs);
            *Label Types of Switches;
            if From_Store ne To_Store then Type_of_Switch='Store Switch';
            if From_Product ne To_Product then Type_of_Switch='Product Switch';
            if (From_Product ne To_Product) and (From_Store ne To_Store) then Type_of_Switch='Product and Store Switch';
            if j>1 then output switches;
        end;
        if i>1 then output;
    end;

    *Delete first occurance so eliminate obs without lag;
    if first.Consumer_ID then delete;

    *Output switches;
    if (From_Store ne To_Store) or (From_Product ne To_Product) then output;
run;