我有两个数据框,DF1是(单词字典),DF2是句子。我想以这样的方式进行文本匹配:如果DF1中的单词与DF2句子(句子中的任何单词)匹配,那么输出应该是如果匹配,则为“是”列;如果匹配数据帧,则为“否”,如下所示:
(DF1)字典:
$_ =~ /.../
(DF2)的句子:
DF1 <- c("csi", "dsi", "market", "share", "improvement", "dealers", "increase")
并输出应为:
客户满意度指数改善(是)
零售周期减少(不)
提高市场份额(是)
从供应商恢复的百分比(否)
note-是和否是显示文本匹配结果的不同列 任何人都可以帮助.....提前感谢
答案 0 :(得分:2)
你可以这样做:
df <- data.frame(sentence = c("Customer satisfaction index improvement", "reduction in retail cycle", "Improve market share", "% recovery from vendor"))
words <- c("csi", "dsi", "market", "share", "improvement", "dealers", "increase")
# combine the words in a regular expression and bind it as column yes
df <- cbind(df, yes = grepl(paste(words, collapse = "|"), df$sentence))
<小时/> 输出
sentence yes
1 Customer satisfaction index improvement TRUE
2 reduction in retail cycle FALSE
3 Improve market share TRUE
4 % recovery from vendor FALSE
答案 1 :(得分:1)
试试这个:
DF1 <- c("csi", "dsi", "market", "share", "improvement", "dealers", "increase")
DF2 <- c("Customer satisfaction index improvement", "reduction in retail cycle", "Improve market share", "% recovery from vendor")
result <- cbind(DF2, "word found" = ifelse(rowSums(sapply(DF1, grepl, x = DF2)) > 0, "YES", "NO"))
> result
DF2 word found
[1,] "Customer satisfaction index improvement" "YES"
[2,] "reduction in retail cycle" "NO"
[3,] "Improve market share" "YES"
[4,] "% recovery from vendor" "NO"