我有一个带有术语
的字典terms <- c("hello world", "great job")
terms <- as.data.frame(terms)
,我想在其他data.frame中搜索包含文档
的第一个匹配项doc <- c("i would like to say hello worlds", "hey friends hello world everyone", "i'm looking for a great job", "great job")
docs <- as.data.frame(doc)
期望的结果:
foundtext <- c("i would like to say hello worlds","i'm looking for a great job")
output <- cbind(terms, foundtext)
感谢您的协助!
答案 0 :(得分:0)
此解决方案非常简单且有效。正如我所说,我没有使用正则表达式。
doc <- c("i would like to say hello worlds", "hey friends hello world everyone", "i'm looking for a great job", "great job")
docs <- as.data.frame(doc)
docs$match <- "not found" #or just empty
for (i in terms){
docs$new <- grepl(i, docs$doc, perl=TRUE)
docs$match[docs$new=="TRUE"] <- i
next
}
docs <- subset(docs,,1:2)
docs$dupl <- !duplicated(docs$match, fromLast=FALSE)
docs <- subset(subset(docs, dupl=="TRUE"),,1:2)
docs