如何将因子转移到列?

时间:2015-05-28 19:16:40

标签: r plyr

我有以下data.frame:

> tail(contacts.byChannel.weekly, 20)

    WEEK_START   WEEK_END COMM_TYPE_CODE  CONTACT_CHANNEL TOTAL_CONTACTS TOTAL_HMD TOTAL_HMD_NO        NRR
 1: 2015-05-03 2015-05-09          PHONE    PHONE - OTHER            326       104           14 0.13461538
 2: 2015-05-03 2015-05-09          PHONE PHONE - OTHER_DD            313        89            8 0.08988764
 3: 2015-05-10 2015-05-16           CHAT             CHAT            576       132           20 0.15151515
 4: 2015-05-10 2015-05-16          EMAIL            EMAIL            933       124           37 0.29838710
 5: 2015-05-10 2015-05-16          PHONE      PHONE - C2C            203        50           12 0.24000000
 6: 2015-05-10 2015-05-16          PHONE   PHONE - GOOGLE            197        48            3 0.06250000
 7: 2015-05-10 2015-05-16          PHONE    PHONE - OTHER            487       166           25 0.15060241
 8: 2015-05-10 2015-05-16          PHONE PHONE - OTHER_DD            334        90           12 0.13333333
 9: 2015-05-17 2015-05-23           CHAT             CHAT            568       107           17 0.15887850
10: 2015-05-17 2015-05-23          EMAIL            EMAIL           1023       141           39 0.27659574
11: 2015-05-17 2015-05-23          PHONE      PHONE - C2C            156        44            5 0.11363636
12: 2015-05-17 2015-05-23          PHONE   PHONE - GOOGLE            224        46            7 0.15217391
13: 2015-05-17 2015-05-23          PHONE    PHONE - OTHER            553       165           11 0.06666667
14: 2015-05-17 2015-05-23          PHONE PHONE - OTHER_DD            386       108           11 0.10185185
15: 2015-05-24 2015-05-30           CHAT             CHAT             25         2            1 0.50000000
16: 2015-05-24 2015-05-30          EMAIL            EMAIL             33         3            2 0.66666667
17: 2015-05-24 2015-05-30          PHONE      PHONE - C2C              8         0            0        NaN
18: 2015-05-24 2015-05-30          PHONE   PHONE - GOOGLE              6         2            0 0.00000000
19: 2015-05-24 2015-05-30          PHONE    PHONE - OTHER             10         2            1 0.50000000
20: 2015-05-24 2015-05-30          PHONE PHONE - OTHER_DD             11         1            0 0.00000000

如何使用ddply& transform的功能是将上述内容转换为:

WEEK_START    WEEK_END    PHONE.TOTAL_CONTACTS  CHAT.TOTAL_CONTACTS  EMAIL.TOTAL_CONTACTS     
2015-05-03    2015-05-09  sum(total_contacts)   sum(total_contacts)  sum(total_contacts)
2015-05-10    2015-05-17  "                     "                    "
2015-05-24    2015-05-30  "                     "                    "

,其中列[, 3:5]COMM_TYPE_CODE,并且根据周开始和&值对值进行求和。结束?

以下是示例数据:

set.seed(1234)
foo <- data.frame(
  WEEK_START = as.Date(c("2015-05-03", "2015-05-10", "2015-05-17", "2015-05-24", "2015-05-03", "2015-05-10", "2015-05-17", "2015-05-24", "2015-05-03", "2015-05-10", "2015-05-17", "2015-05-24")),
  WEEK_END = as.Date(c("2015-05-09", "2015-05-16", "2015-05-23", "2015-05-30", "2015-05-09", "2015-05-16", "2015-05-23", "2015-05-30", "2015-05-09", "2015-05-16", "2015-05-23", "2015-05-30")),
  COMM_TYPE_CODE = c(rep("CHAT", 4), rep("EMAIL", 4), rep("PHONE", 4)),
  TOTAL_CONTACTS = rbinom(12, 10000, .1))

谢谢!

3 个答案:

答案 0 :(得分:2)

尝试

library(reshape2)
dcast(foo, WEEK_START+WEEK_END~COMM_TYPE_CODE, value.var='TOTAL_CONTACTS' , sum)

对于多值列,可以使用devel data.tablev1.9.5,即 set.seed(24) foo$TOTAL_HMD <- sample(900:1200, 12, replace=FALSE) library(data.table)#v1.9.5+ dcast(setDT(foo), WEEK_START+WEEK_END~COMM_TYPE_CODE, value.var=c('TOTAL_CONTACTS', 'TOTAL_HMD'), sum) # WEEK_START WEEK_END CHAT_sum_TOTAL_CONTACTS EMAIL_sum_TOTAL_CONTACTS #1: 2015-05-03 2015-05-09 971 2033 #2: 2015-05-10 2015-05-16 1013 2027 #3: 2015-05-17 2015-05-23 1014 1975 #4: 2015-05-24 2015-05-30 987 1984 # CHAT_sum_TOTAL_HMD EMAIL_sum_TOTAL_HMD #1: 988 2230 #2: 967 2146 #3: 1110 2058 #4: 1054 2131

 library(reshape2)
 recast(foo, id.var=1:3, ...~COMM_TYPE_CODE+variable, value.var='value', sum)

或者

for (int i = 0; i < len; i++) 
   bits[mem[i]] = true;

答案 1 :(得分:2)

好的,所以在挖掘之后,我找到了:Cast multiple value columns

应用:

set.seed(1234)
foo <- data.frame(
  WEEK_START = as.Date(c("2015-05-03", "2015-05-10", "2015-05-17", "2015-05-24", "2015-05-03", "2015-05-10", "2015-05-17", "2015-05-24", "2015-05-03", "2015-05-10", "2015-05-17", "2015-05-24")),
  WEEK_END = as.Date(c("2015-05-09", "2015-05-16", "2015-05-23", "2015-05-30", "2015-05-09", "2015-05-16", "2015-05-23", "2015-05-30", "2015-05-09", "2015-05-16", "2015-05-23", "2015-05-30")),
  COMM_TYPE_CODE = c(rep("CHAT", 4), rep("EMAIL", 4), rep("PHONE", 4)),
  TOTAL_CONTACTS = rbinom(12, 10000, .1),
  TOTAL_HMD = sample(900:1200, 12, replace=FALSE))

library(reshape2)
melt.foo <- melt(foo, id.vars = 1:3)  # Note first 4 columns
pivot.foo <- dcast(melt.contacts.byChannel.weekly, WEEK_START+WEEK_END ~ COMM_TYPE_CODE + variable, fun.aggregate = sum)

答案 2 :(得分:0)

注意周开始日期和结束日期之间存在1-1关系,您实际上不需要有两个行标识符。为什么不呢:

with (foo,  tapply(TOTAL_CONTACTS,
              INDEX=   list( WeekStart_End= paste( WEEK_START, WEEK_END, sep=" - "),
                             Sum_CONTACT_CHANNEL=COMM_TYPE_CODE),
                 FUN=sum) )

#----------------
                         Sum_CONTACT_CHANNEL
WeekStart_End             CHAT EMAIL PHONE
  2015-05-03 - 2015-05-09  971  1025  1008
  2015-05-10 - 2015-05-16 1013  1025  1002
  2015-05-17 - 2015-05-23 1014   967  1008
  2015-05-24 - 2015-05-30  987   973  1011

注意行号的格式,可能不是数据帧,而是'data.table',尽管data.tables确实从data.frames继承。