我有一个角色向量
var1 <- c("pine tree", "forest", "fruits", "water")
和一个清单
var2 <- list(c("tree", "house", "star"), c("house", "tree", "dense forest"), c("apple", "orange", "grapes"))
我希望将var1中的单词与var2中的单词匹配,并根据匹配的单词数量对列表元素进行RANK。例如,
[[2]]
[1] "house" "tree" "dense forest"
与var1
有2个匹配项[[1]]
[1] "tree" "house" "star"
与var1
匹配1次[[3]]
[1] "apple" "orange" "grapes"
与var1
匹配0所需的输出如下:
[1] "house" "tree" "dense forest"
[2] "tree" "house" "star"
[3] "apple" "orange" "grapes"
我试过
sapply(var1, grep, var2, ignore.case=T, value=T)
没有获得所需的输出。
如何解决?一个代码片段将不胜感激。 感谢。
编辑:
如上所述,问题已从短语匹配编辑为短语中的单词匹配。
答案 0 :(得分:4)
你可以尝试
var2[[which.max(lapply(var2, function(x) sum(var1 %in% x)))]]
[1] "house" "tree" "forest"
从OP和@franks评论的最后一次修改
var2[order(-sapply(var2, function(x) sum(var1 %in% x)))]
[[1]]
[1] "house" "tree" "forest"
[[2]]
[1] "tree" "house" "star"
[[3]]
[1] "apple" "orange" "grapes"