我有一个如下所示的数据框:
message.id,sender,recipients
1,A,B|C
2,A,B
3,B,C|D|Q
我想将recipients
列拆分为“|”然后收集结果以产生这个:
message.id,sender,recipient
1,A,B
1,A,C
2,A,B
3,B,C
3,B,D
3,B,Q
有什么更清晰的方法来完成这种操作?这是我目前的代码:
library(dplyr)
library(stringr)
library(tidyr)
df <- data.frame(message.id = c(1,2,3),
sender = c("A","A","B"),
recipients = c("B|C","B","C|D|Q"))
max.splits = df$recipients %>% str_count("\\|") %>% max + 1
df %>% separate(recipients,1:max.splits, sep = "\\|") %>%
gather(trash,recipient,-message.id,-sender) %>%
select(message.id, sender, recipient) %>%
filter(recipient %>% is.na == FALSE) %>%
arrange(message.id)
答案 0 :(得分:3)
我有偏见,但我建议{&#34; splitstackshape&#34;} cSplit
封装
用法只是:
library(splitstackshape)
cSplit(df, "recipients", "|", "long")
# message.id sender recipients
# 1: 1 A B
# 2: 1 A C
# 3: 2 A B
# 4: 3 B C
# 5: 3 B D
# 6: 3 B Q
或者,使用&#34; dplyr&#34;用于管道和&#34; tidyr&#34;对于unnest
,然后您可以尝试:
library(dplyr)
library(tidyr)
df %>%
mutate(recipients = as.character(recipients)) %>% ## need character for strsplit
mutate(recipients = strsplit(recipients, "|", TRUE)) %>% ## Use `fixed = TRUE`
unnest(recipients) ## `unnest` goes to long form
# Source: local data frame [6 x 3]
#
# message.id sender recipients
# (dbl) (fctr) (chr)
# 1 1 A B
# 2 1 A C
# 3 2 A B
# 4 3 B C
# 5 3 B D
# 6 3 B Q
答案 1 :(得分:1)
我们可以使用data.table
library(data.table)
setDT(df)[, list(recipient=unlist(strsplit(recipients, '[|]'))),
.(message.id, sender)]
答案 2 :(得分:1)
如何使用plyr
?
library(plyr)
ddply(df, .(message.id), function(d){
cbind(
sender = as.character(d$sender),
recipients = strsplit(as.character(d$recipients), "\\|")[[1]]
)
})
答案 3 :(得分:1)
以下是使用dplyr
和tidyr
df <- data.frame(message.id = 1:3, sender = c("A","A","B"),
recipients = c("B|C","B","C|D|Q"))
原始数据
message.id sender recipients
1 1 A B|C
2 2 A B
3 3 B C|D|Q
代码
df %>% separate(recipients,into =c("r1","r2","r3")) %>%
gather("sen","recipient",r1:r3) %>% select(-sen) %>%
filter(!is.na(recipient))
结果
message.id sender recipient
1 1 A B
2 2 A B
3 3 B C
4 1 A C
5 3 B D
6 3 B Q