将一列拆分成多个的更好方法,然后收集结果?

时间:2015-12-30 05:53:05

标签: r tidyr magrittr

我有一个如下所示的数据框:

message.id,sender,recipients
1,A,B|C
2,A,B
3,B,C|D|Q

我想将recipients列拆分为“|”然后收集结果以产生这个:

message.id,sender,recipient
1,A,B
1,A,C
2,A,B
3,B,C
3,B,D
3,B,Q

有什么更清晰的方法来完成这种操作?这是我目前的代码:

library(dplyr)
library(stringr)
library(tidyr)

df <- data.frame(message.id = c(1,2,3),
                 sender = c("A","A","B"),
                 recipients = c("B|C","B","C|D|Q"))

max.splits = df$recipients %>% str_count("\\|") %>% max + 1

df %>% separate(recipients,1:max.splits, sep = "\\|") %>%
  gather(trash,recipient,-message.id,-sender) %>%
  select(message.id, sender, recipient) %>%
  filter(recipient %>% is.na == FALSE) %>%
  arrange(message.id)

4 个答案:

答案 0 :(得分:3)

我有偏见,但我建议{&#34; splitstackshape&#34;} cSplit封装

用法只是:

library(splitstackshape)
cSplit(df, "recipients", "|", "long")
#    message.id sender recipients
# 1:          1      A          B
# 2:          1      A          C
# 3:          2      A          B
# 4:          3      B          C
# 5:          3      B          D
# 6:          3      B          Q

或者,使用&#34; dplyr&#34;用于管道和&#34; tidyr&#34;对于unnest,然后您可以尝试:

library(dplyr)
library(tidyr)
df %>%
  mutate(recipients = as.character(recipients)) %>%         ## need character for strsplit
  mutate(recipients = strsplit(recipients, "|", TRUE)) %>%  ## Use `fixed = TRUE`
  unnest(recipients)                                        ## `unnest` goes to long form
# Source: local data frame [6 x 3]
# 
#   message.id sender recipients
#        (dbl) (fctr)      (chr)
# 1          1      A          B
# 2          1      A          C
# 3          2      A          B
# 4          3      B          C
# 5          3      B          D
# 6          3      B          Q

答案 1 :(得分:1)

我们可以使用data.table

library(data.table)
setDT(df)[, list(recipient=unlist(strsplit(recipients, '[|]'))),
              .(message.id, sender)]

答案 2 :(得分:1)

如何使用plyr

library(plyr)
ddply(df, .(message.id), function(d){
    cbind(
        sender = as.character(d$sender), 
        recipients = strsplit(as.character(d$recipients), "\\|")[[1]]
    )
})

答案 3 :(得分:1)

以下是使用dplyrtidyr

的解决方案
df <- data.frame(message.id = 1:3, sender = c("A","A","B"),
recipients = c("B|C","B","C|D|Q"))

原始数据

  message.id sender recipients
1          1      A        B|C
2          2      A          B
3          3      B      C|D|Q

代码

df %>% separate(recipients,into =c("r1","r2","r3")) %>% 
gather("sen","recipient",r1:r3) %>% select(-sen) %>% 
filter(!is.na(recipient))

结果

  message.id sender recipient
1          1      A         B
2          2      A         B
3          3      B         C
4          1      A         C
5          3      B         D
6          3      B         Q