我有以下data.frame:
> tail(contacts.byChannel.weekly, 20)
WEEK_START WEEK_END COMM_TYPE_CODE CONTACT_CHANNEL TOTAL_CONTACTS TOTAL_HMD TOTAL_HMD_NO NRR
1: 2015-05-03 2015-05-09 PHONE PHONE - OTHER 326 104 14 0.13461538
2: 2015-05-03 2015-05-09 PHONE PHONE - OTHER_DD 313 89 8 0.08988764
3: 2015-05-10 2015-05-16 CHAT CHAT 576 132 20 0.15151515
4: 2015-05-10 2015-05-16 EMAIL EMAIL 933 124 37 0.29838710
5: 2015-05-10 2015-05-16 PHONE PHONE - C2C 203 50 12 0.24000000
6: 2015-05-10 2015-05-16 PHONE PHONE - GOOGLE 197 48 3 0.06250000
7: 2015-05-10 2015-05-16 PHONE PHONE - OTHER 487 166 25 0.15060241
8: 2015-05-10 2015-05-16 PHONE PHONE - OTHER_DD 334 90 12 0.13333333
9: 2015-05-17 2015-05-23 CHAT CHAT 568 107 17 0.15887850
10: 2015-05-17 2015-05-23 EMAIL EMAIL 1023 141 39 0.27659574
11: 2015-05-17 2015-05-23 PHONE PHONE - C2C 156 44 5 0.11363636
12: 2015-05-17 2015-05-23 PHONE PHONE - GOOGLE 224 46 7 0.15217391
13: 2015-05-17 2015-05-23 PHONE PHONE - OTHER 553 165 11 0.06666667
14: 2015-05-17 2015-05-23 PHONE PHONE - OTHER_DD 386 108 11 0.10185185
15: 2015-05-24 2015-05-30 CHAT CHAT 25 2 1 0.50000000
16: 2015-05-24 2015-05-30 EMAIL EMAIL 33 3 2 0.66666667
17: 2015-05-24 2015-05-30 PHONE PHONE - C2C 8 0 0 NaN
18: 2015-05-24 2015-05-30 PHONE PHONE - GOOGLE 6 2 0 0.00000000
19: 2015-05-24 2015-05-30 PHONE PHONE - OTHER 10 2 1 0.50000000
20: 2015-05-24 2015-05-30 PHONE PHONE - OTHER_DD 11 1 0 0.00000000
如何使用ddply
& transform
的功能是将上述内容转换为:
WEEK_START WEEK_END PHONE.TOTAL_CONTACTS CHAT.TOTAL_CONTACTS EMAIL.TOTAL_CONTACTS
2015-05-03 2015-05-09 sum(total_contacts) sum(total_contacts) sum(total_contacts)
2015-05-10 2015-05-17 " " "
2015-05-24 2015-05-30 " " "
,其中列[, 3:5]
是COMM_TYPE_CODE
,并且根据周开始和&值对值进行求和。结束?
以下是示例数据:
set.seed(1234)
foo <- data.frame(
WEEK_START = as.Date(c("2015-05-03", "2015-05-10", "2015-05-17", "2015-05-24", "2015-05-03", "2015-05-10", "2015-05-17", "2015-05-24", "2015-05-03", "2015-05-10", "2015-05-17", "2015-05-24")),
WEEK_END = as.Date(c("2015-05-09", "2015-05-16", "2015-05-23", "2015-05-30", "2015-05-09", "2015-05-16", "2015-05-23", "2015-05-30", "2015-05-09", "2015-05-16", "2015-05-23", "2015-05-30")),
COMM_TYPE_CODE = c(rep("CHAT", 4), rep("EMAIL", 4), rep("PHONE", 4)),
TOTAL_CONTACTS = rbinom(12, 10000, .1))
谢谢!
答案 0 :(得分:2)
尝试
library(reshape2)
dcast(foo, WEEK_START+WEEK_END~COMM_TYPE_CODE, value.var='TOTAL_CONTACTS' , sum)
对于多值列,可以使用devel
data.table
版v1.9.5
,即 set.seed(24)
foo$TOTAL_HMD <- sample(900:1200, 12, replace=FALSE)
library(data.table)#v1.9.5+
dcast(setDT(foo), WEEK_START+WEEK_END~COMM_TYPE_CODE,
value.var=c('TOTAL_CONTACTS', 'TOTAL_HMD'), sum)
# WEEK_START WEEK_END CHAT_sum_TOTAL_CONTACTS EMAIL_sum_TOTAL_CONTACTS
#1: 2015-05-03 2015-05-09 971 2033
#2: 2015-05-10 2015-05-16 1013 2027
#3: 2015-05-17 2015-05-23 1014 1975
#4: 2015-05-24 2015-05-30 987 1984
# CHAT_sum_TOTAL_HMD EMAIL_sum_TOTAL_HMD
#1: 988 2230
#2: 967 2146
#3: 1110 2058
#4: 1054 2131
。
library(reshape2)
recast(foo, id.var=1:3, ...~COMM_TYPE_CODE+variable, value.var='value', sum)
或者
for (int i = 0; i < len; i++)
bits[mem[i]] = true;
答案 1 :(得分:2)
好的,所以在挖掘之后,我找到了:Cast multiple value columns
应用:
set.seed(1234)
foo <- data.frame(
WEEK_START = as.Date(c("2015-05-03", "2015-05-10", "2015-05-17", "2015-05-24", "2015-05-03", "2015-05-10", "2015-05-17", "2015-05-24", "2015-05-03", "2015-05-10", "2015-05-17", "2015-05-24")),
WEEK_END = as.Date(c("2015-05-09", "2015-05-16", "2015-05-23", "2015-05-30", "2015-05-09", "2015-05-16", "2015-05-23", "2015-05-30", "2015-05-09", "2015-05-16", "2015-05-23", "2015-05-30")),
COMM_TYPE_CODE = c(rep("CHAT", 4), rep("EMAIL", 4), rep("PHONE", 4)),
TOTAL_CONTACTS = rbinom(12, 10000, .1),
TOTAL_HMD = sample(900:1200, 12, replace=FALSE))
library(reshape2)
melt.foo <- melt(foo, id.vars = 1:3) # Note first 4 columns
pivot.foo <- dcast(melt.contacts.byChannel.weekly, WEEK_START+WEEK_END ~ COMM_TYPE_CODE + variable, fun.aggregate = sum)
答案 2 :(得分:0)
注意周开始日期和结束日期之间存在1-1关系,您实际上不需要有两个行标识符。为什么不呢:
with (foo, tapply(TOTAL_CONTACTS,
INDEX= list( WeekStart_End= paste( WEEK_START, WEEK_END, sep=" - "),
Sum_CONTACT_CHANNEL=COMM_TYPE_CODE),
FUN=sum) )
#----------------
Sum_CONTACT_CHANNEL
WeekStart_End CHAT EMAIL PHONE
2015-05-03 - 2015-05-09 971 1025 1008
2015-05-10 - 2015-05-16 1013 1025 1002
2015-05-17 - 2015-05-23 1014 967 1008
2015-05-24 - 2015-05-30 987 973 1011
注意行号的格式,可能不是数据帧,而是'data.table',尽管data.tables确实从data.frames继承。