我有3500多个项目的数据框,我想grep在Project_Description
列中搜索40个关键词。如果Project_Description
包含一个或多个关键字,我想创建一个新列,并用关键字标记该项目的行。
如何创建一个遍历关键字的if语句,如果找到该关键字,则用该关键字标记正确的行?特别是如果Project_Description
可能包含多个关键字之一?
到目前为止,我已经能够提取出至少包含Project_Description
列中的关键字之一的项目行。
key_words <- c("who","what","when","where","why", etc...)
dataframe_key_words <- c()
for (i in 1:length(key_words)){
dataframe_key_words <- rbind(dataframe_key_words, dataframe_original[grep(key_words[i], dataframe_original$Project_Description), ]
}
答案 0 :(得分:0)
您可以尝试以下方法:
library(data.table)
library(stringi)
key_words <- c("where", "why")
pat <- paste0("(", paste0(key_words, collapse = "|"), ")")
DT <- data.table(descr = c("where is the sample data? why do you do this?",
"this doesn't have any of the keywords"))
DT[, kw := lapply(stri_match_all_regex(descr, pat), function(x) x[, 2])][]
# descr kw
# 1: where is the sample data? why do you do this? where,why
# 2: this doesn't have any of the keywords NA