创建一个包含grepped字符串的新列

时间:2019-04-04 18:41:25

标签: r string-matching

我有3500多个项目的数据框,我想grep在Project_Description列中搜索40个关键词。如果Project_Description包含一个或多个关键字,我想创建一个新列,并用关键字标记该项目的行。

如何创建一个遍历关键字的if语句,如果找到该关键字,则用该关键字标记正确的行?特别是如果Project_Description可能包含多个关键字之一?

到目前为止,我已经能够提取出至少包含Project_Description列中的关键字之一的项目行。

key_words <- c("who","what","when","where","why", etc...)

dataframe_key_words <- c()

for (i in 1:length(key_words)){
dataframe_key_words <- rbind(dataframe_key_words, dataframe_original[grep(key_words[i], dataframe_original$Project_Description), ]
}

1 个答案:

答案 0 :(得分:0)

您可以尝试以下方法:

library(data.table)
library(stringi)
key_words <- c("where", "why")
pat <- paste0("(", paste0(key_words, collapse = "|"), ")")
DT <- data.table(descr = c("where is the sample data? why do you do this?", 
                           "this doesn't have any of the keywords"))
DT[, kw := lapply(stri_match_all_regex(descr, pat), function(x) x[, 2])][]

#                                            descr        kw
# 1: where is the sample data? why do you do this? where,why
# 2:         this doesn't have any of the keywords        NA