我有一个包含4列的数据集
Time User.ID Campaign.ID ZIP.Postal.Code
1.495062e+15 AMsySZY9u3XoNZ4qOfmK2JnaXbBg 10852036 H3H
1.495061e+15 AMsySZZE17Pzu6wwv_HkNhVDYSFJ 10852036 L8E
1.495061e+15 AMsySZa8l0q0G9zNCsqGQ9-y5MYi 11181834 G1V
1.495060e+15 AMsySZZOF_CrRXtClA8dna1W-YVg 11181834 T2N
1.495061e+15 AMsySZaGnaf3z8Q7BzFkzxhLD76R 10852036 V7H
1.495061e+15 AMsySZb_uZeGo8NmzdWUBbEL7HEl 11272183 N2C
每行代表用户(由唯一的User.ID标识)点击特定广告(由Campaign.ID标识)的时间。此特定数据集中大约有大约15个广告系列ID。
我想将此数据集组织成以下表格
User.ID Click_10852036 Click_11181834 ...
AMsySZY9u3XoNZ4qOfmK2JnaXbBg 1 0
AMsySZb_uZeGo8NmzdWUBbEL7HEl 0 3
每行代表一个用户(此表中的User.ID是唯一的),每列代表此用户点击该特定广告的次数。
我知道我可以使用ddply
来做到这一点table_c = ddply(data, .(User.ID), summarize,
click_10852036 = sum(Campaign.ID == '10852036'),
click_9349165 = sum(Campaign.ID == '9349165'),
click_11272183 = sum(Campaign.ID == '11272183'),
click_11266100 = sum(Campaign.ID == '11266100'),
click_11181834 = sum(Campaign.ID == '11181834'),
click_10950859 = sum(Campaign.ID == '10950859'),
click_11224930 = sum(Campaign.ID == '11224930'),
click_11224368 = sum(Campaign.ID == '11224368'),
click_11029515 = sum(Campaign.ID == '11029515'),
click_9123038 = sum(Campaign.ID == '9123038'),
click_10748814 = sum(Campaign.ID == '10748814'),
click_10792241 = sum(Campaign.ID == '10792241'),
click_11152245 = sum(Campaign.ID == '11152245'),
click_10675627 = sum(Campaign.ID == '10675627'),
click_8532119 = sum(Campaign.ID == '8532119'),
click_10811017 = sum(Campaign.ID == '10811017'),
click_10694683 = sum(Campaign.ID == '10694683'),
click_11463760 = sum(Campaign.ID == '11463760'),
click_9676864 = sum(Campaign.ID == '9676864'),
click_10847880 = sum(Campaign.ID == '10847880'))
有没有一种方法可以在没有明确写出所有列标题的情况下总结这个表?
由于
答案 0 :(得分:1)
library(reshape2)
df$Count=1
df1=as.data.frame(acast(df, User.ID~Campaign.ID,value.var="Count"))
names(df1)=paste0('Click_',names(df1))
#Change the NA to 0
df1[is.na(df1)]=0
> df1
Click_10852036 Click_11181834 Click_11272183
AMsySZa8l0q0G9zNCsqGQ9-y5MYi 0 1 0
AMsySZaGnaf3z8Q7BzFkzxhLD76R 1 0 0
AMsySZb_uZeGo8NmzdWUBbEL7HEl 0 0 1
AMsySZY9u3XoNZ4qOfmK2JnaXbBg 1 0 0
AMsySZZE17Pzu6wwv_HkNhVDYSFJ 1 0 0
AMsySZZOF_CrRXtClA8dna1W-YVg 0 1 0