交易清单到篮子数据

时间:2017-03-17 03:26:52

标签: r list transactional

我有一张像

这样的表格
ID    Productpurchased   Year
1A          Abc          2011
1A          Abc          2011       
1A          xyz          2011
1A          Abc          2012
2A          bcd          2013
2A          Abc          2013

输出所需格式

ID       Purchase basket     Year     Abc-count  xyz-count  bcd-count    
1A       (Abc,xyz)           2011      2           1          0
1A       (Abc)               2012      1           0          0
2A       (bcd , Abc)         2013      1           0          1

2 个答案:

答案 0 :(得分:1)

我们可以使用data.table轻松完成此操作。转换' data.frame'到' data.table' (setDT(df1)),按ID' ID'分组,'年',paste&{39}产品购买的unique元素'并指定(:=)它来创建' Purchase_basket'列,然后是来自' long'的dcast广泛的'将fun.aggregate指定为length

library(data.table)
dcast(setDT(df1)[, Purchase_basket := toString(unique(Productpurchased)),.(ID, Year)],
       ID + Year + Purchase_basket ~paste0(Productpurchased, ".count"), length)
#    ID Year Purchase_basket Abc.count bcd.count xyz.count
#1: 1A 2011        Abc, xyz         2         0         1
#2: 1A 2012             Abc         1         0         0
#3: 2A 2013        bcd, Abc         1         1         0

答案 1 :(得分:0)

与data.table完全相同的逻辑,但使用dplyr。

df_2 <- read.table(text = 'ID    Productpurchased   Year
1A          Abc          2011
1A          Abc          2011       
1A          xyz          2011
1A          Abc          2012
2A          bcd          2013
2A          Abc          2013',
header = TRUE, stringsAsFactors = FALSE)



df_2 %>% group_by( ID, Year) %>%  
  mutate(Abc_count=grepl("Abc", Productpurchased), 
         bcd_count=grepl("bcd", Productpurchased),
         xyz_count=grepl("xyz", Productpurchased)) %>% 
  summarise(Productpurchased = paste("(", paste(unique(Productpurchased), collapse = ","),")", sep=""),
            Abc_count=sum(Abc_count), 
            bcd_count=sum(bcd_count),
            xyz_count=sum(xyz_count))