Question

我是R的新手并且一直在努力解决这个问题。我想创建一个新列，检查列中是否存在一组任何单词（＆＃34; foo＆＃34;，＆＃34; x＆＃34;，＆＃34; y＆＃34;） 39;文本＆＃39;，然后在新列中写入该值。

我有一个如下所示的数据框：a-＆gt;

 id     text        time   username
 1     "hello x"     10     "me"
 2     "foo and y"   5      "you"
 3     "nothing"     15     "everyone"
 4     "x,y,foo"     0      "know"

正确的输出应该是：

a2 - ＆gt;

id     text        time   username        keywordtag  
 1     "hello x"     10     "me"          x
 2     "foo and y"   5      "you"         foo,y
 3     "nothing"     15     "everyone"    0 
 4     "x,y,foo"     0      "know"        x,y,foo

我有这个：

df1 <- data.frame(text = c("hello x", "foo and y", "nothing", "x,y,foo"))
terms <- c('foo', 'x', 'y')
df1$keywordtag <- apply(sapply(terms, grepl, df1$text), 1, function(x) paste(terms[x], collapse=','))

哪个有效，但当我的needleList包含12k个单词且我的文本有155k行时崩溃R.有没有办法做到这一点，不会让R？

崩溃

Answer 1

这是对您所做的事情以及评论中建议的内容的变体。这使用dplyr和stringr。可能有一种更有效的方法，但这可能不会导致R会话崩溃。

library(dplyr)
library(stringr)

terms      <- c('foo', 'x', 'y')
term_regex <- paste0('(', paste(terms, collapse = '|'), ')')

### Solution: this uses dplyr::mutate and stringr::str_extract_all
df1 %>%
    mutate(keywordtag = sapply(str_extract_all(text, term_regex), function(x) paste(x, collapse=',')))
#       text keywordtag
#1   hello x          x
#2 foo and y      foo,y
#3   nothing           
#4   x,y,foo    x,y,foo

R：提取并粘贴关键字匹配

1 个答案: