我的数据框具有以下内容:
RECORDID FEEDBACK
1234 Phrase1
1234 Phrase2
1234 Phrase3
1234 ""
1234 notaPhrase but whole lots of words
我想匹配总共6个词组,然后合并为一列。在此示例中,我需要结果为
RECORDID NewColumn FEEDBACK
1234 Phrase1, Phrase2, Phrase3 notaPhrase but whole lots of words
如何在R中执行此操作?
答案 0 :(得分:0)
我们可以按“ RECORDID”分组,选择具有逻辑索引的“词组”元素,并通过summarise
在“反馈”列中选择paste
library(dplyr)
library(stringr)
df1 %>%
filter(!is.na(FEEDBACK) & FEEDBACK != "") %>%
mutate(flag = str_detect(FEEDBACK, '^Phrase\\d+$')) %>%
group_by(RECORDID) %>%
summarise(NewColumn = toString(FEEDBACK[flag]),
FEEDBACK = toString(FEEDBACK[!flag]))
# A tibble: 1 x 3
# RECORDID NewColumn FEEDBACK
# <int> <chr> <chr>
#1 1234 Phrase1, Phrase2, Phrase3 notaPhrase but whole lots of words
注意:在上述解决方案中,我们实际上匹配了OP'帖子中显示的单词。如果要计算单词数,请使用str_count
df1 %>%
filter(!is.na(FEEDBACK) & FEEDBACK != "") %>%
mutate(flag = str_count(FEEDBACK, '\\w+') == 1) %>%
group_by(RECORDID) %>%
summarise(NewColumn = toString(FEEDBACK[flag]),
FEEDBACK = toString(FEEDBACK[!flag]))
df1 <- structure(list(RECORDID = c(1234L, 1234L, 1234L, 1234L, 1234L
), FEEDBACK = c("Phrase1", "Phrase2", "Phrase3", "", "notaPhrase but whole lots of words"
)), class = "data.frame", row.names = c(NA, -5L))