Question

我有两个数据框，DF1是（单词字典），DF2是句子。我想以这样的方式进行文本匹配：如果DF1中的单词与DF2句子（句子中的任何单词）匹配，那么输出应该是如果匹配，则为“是”列;如果匹配数据帧，则为“否”，如下所示：

（DF1）字典：

$_ =~ /.../

（DF2）的句子：

DF1 <- c("csi", "dsi", "market", "share", "improvement", "dealers", "increase")

并输出应为：

客户满意度指数改善（是）

零售周期减少（不）

提高市场份额（是）

从供应商恢复的百分比（否）

note-是和否是显示文本匹配结果的不同列任何人都可以帮助.....提前感谢

Answer 1

你可以这样做：

df <- data.frame(sentence = c("Customer satisfaction index improvement", "reduction in retail cycle", "Improve market share", "% recovery from vendor"))
words <- c("csi", "dsi", "market", "share", "improvement", "dealers", "increase")

# combine the words in a regular expression and bind it as column yes
df <- cbind(df, yes = grepl(paste(words, collapse = "|"), df$sentence))

<小时/> 输出

                                 sentence   yes
1 Customer satisfaction index improvement  TRUE
2               reduction in retail cycle FALSE
3                    Improve market share  TRUE
4                  % recovery from vendor FALSE

见working on ideone.com。

Answer 2

试试这个：

DF1 <- c("csi", "dsi", "market", "share", "improvement", "dealers", "increase")
DF2 <- c("Customer satisfaction index improvement", "reduction in retail cycle", "Improve market share", "% recovery from vendor")


result <- cbind(DF2, "word found" = ifelse(rowSums(sapply(DF1, grepl, x = DF2)) > 0, "YES", "NO"))

> result
     DF2                                       word found
[1,] "Customer satisfaction index improvement" "YES"     
[2,] "reduction in retail cycle"               "NO"      
[3,] "Improve market share"                    "YES"     
[4,] "% recovery from vendor"                  "NO"

使用r在两列中匹配的单词

2 个答案: