我有两个数据帧a(数据文件)和b(参考),我需要比较答案列,应用余弦相似度,获得具有最佳匹配值及其余弦的数据帧。稍后,我需要对a中存在的b的每个答案进行计数(基于余弦相似度之后获得的最佳匹配值)。
a <- data.frame(Answer = c("Hey <firstname>, here are some topics I have been helping folks",
"here are some topics I have been helping folks, have a nice day,<
"hello there, here are some topics I have been helping folks",
"Your final job decisions post the cycle will be available on this site",
"Compensation details will be sent on mail. Final job decisions post the cycle will be available on this link, have a great day"))
b <- data.frame(Answer = c("here are some topics I have been helping folks,"Final Rewards decisions post the cycle will be available here","reward decisions post the cycle will be available on this link, have a great day"))
预期输出:
Result = data.frame(Answer = c("here are some topics I have been helping folks,"Final Rewards decisions post the cycle will be available here"),count=c(3,2))