Question

我的数据框具有以下内容：

RECORDID FEEDBACK
1234     Phrase1
1234     Phrase2
1234     Phrase3
1234     ""
1234     notaPhrase but whole lots of words

我想匹配总共6个词组，然后合并为一列。在此示例中，我需要结果为

    RECORDID NewColumn                 FEEDBACK
    1234     Phrase1, Phrase2, Phrase3 notaPhrase but whole lots of words

如何在R中执行此操作？

Answer 1

我们可以按“ RECORDID”分组，选择具有逻辑索引的“词组”元素，并通过summarise在“反馈”列中选择paste

library(dplyr)
library(stringr)
df1 %>%
   filter(!is.na(FEEDBACK) & FEEDBACK != "") %>%
   mutate(flag = str_detect(FEEDBACK, '^Phrase\\d+$')) %>%
   group_by(RECORDID) %>%
   summarise(NewColumn = toString(FEEDBACK[flag]),
             FEEDBACK = toString(FEEDBACK[!flag]))
# A tibble: 1 x 3
#   RECORDID NewColumn                 FEEDBACK                          
#      <int> <chr>                     <chr>                             
#1     1234 Phrase1, Phrase2, Phrase3 notaPhrase but whole lots of words

注意：在上述解决方案中，我们实际上匹配了OP'帖子中显示的单词。如果要计算单词数，请使用str_count

df1 %>%
   filter(!is.na(FEEDBACK) & FEEDBACK != "") %>%
   mutate(flag = str_count(FEEDBACK, '\\w+') == 1) %>%
   group_by(RECORDID) %>%
   summarise(NewColumn = toString(FEEDBACK[flag]),
             FEEDBACK = toString(FEEDBACK[!flag]))

数据

df1 <- structure(list(RECORDID = c(1234L, 1234L, 1234L, 1234L, 1234L
), FEEDBACK = c("Phrase1", "Phrase2", "Phrase3", "", "notaPhrase but whole lots of words"
)), class = "data.frame", row.names = c(NA, -5L))

根据设置值组合并创建新列

1 个答案:

数据