我有一张像
这样的表格ID Productpurchased Year
1A Abc 2011
1A Abc 2011
1A xyz 2011
1A Abc 2012
2A bcd 2013
2A Abc 2013
输出所需格式
ID Purchase basket Year Abc-count xyz-count bcd-count
1A (Abc,xyz) 2011 2 1 0
1A (Abc) 2012 1 0 0
2A (bcd , Abc) 2013 1 0 1
答案 0 :(得分:1)
我们可以使用data.table
轻松完成此操作。转换' data.frame'到' data.table' (setDT(df1)
),按ID' ID'分组,'年',paste
&{39}产品购买的unique
元素'并指定(:=
)它来创建' Purchase_basket'列,然后是来自' long'的dcast
广泛的'将fun.aggregate
指定为length
library(data.table)
dcast(setDT(df1)[, Purchase_basket := toString(unique(Productpurchased)),.(ID, Year)],
ID + Year + Purchase_basket ~paste0(Productpurchased, ".count"), length)
# ID Year Purchase_basket Abc.count bcd.count xyz.count
#1: 1A 2011 Abc, xyz 2 0 1
#2: 1A 2012 Abc 1 0 0
#3: 2A 2013 bcd, Abc 1 1 0
答案 1 :(得分:0)
与data.table完全相同的逻辑,但使用dplyr。
df_2 <- read.table(text = 'ID Productpurchased Year
1A Abc 2011
1A Abc 2011
1A xyz 2011
1A Abc 2012
2A bcd 2013
2A Abc 2013',
header = TRUE, stringsAsFactors = FALSE)
df_2 %>% group_by( ID, Year) %>%
mutate(Abc_count=grepl("Abc", Productpurchased),
bcd_count=grepl("bcd", Productpurchased),
xyz_count=grepl("xyz", Productpurchased)) %>%
summarise(Productpurchased = paste("(", paste(unique(Productpurchased), collapse = ","),")", sep=""),
Abc_count=sum(Abc_count),
bcd_count=sum(bcd_count),
xyz_count=sum(xyz_count))