R匹配列表中的单词

时间:2015-05-22 15:34:11

标签: r list grep string-matching

我有一个角色向量

var1 <- c("pine tree", "forest", "fruits", "water")

和一个清单

var2 <- list(c("tree", "house", "star"),  c("house", "tree", "dense forest"), c("apple", "orange", "grapes"))

我希望将var1中的单词与var2中的单词匹配,并根据匹配的单词数量对列表元素进行RANK。例如,

[[2]]
[1] "house"  "tree"   "dense forest"

与var1

有2个匹配项
[[1]]
[1] "tree"  "house" "star"   

与var1

匹配1次
[[3]]
[1] "apple"  "orange" "grapes"

与var1

匹配0

所需的输出如下:

[1] "house"  "tree"   "dense forest"
[2] "tree"  "house" "star"
[3] "apple"  "orange" "grapes"

我试过

sapply(var1, grep,  var2, ignore.case=T, value=T)

没有获得所需的输出。

如何解决?一个代码片段将不胜感激。 感谢。

编辑:

如上所述,问题已从短语匹配编辑为短语中的单词匹配。

1 个答案:

答案 0 :(得分:4)

你可以尝试

var2[[which.max(lapply(var2, function(x) sum(var1 %in% x)))]]
[1] "house"  "tree"   "forest"

从OP和@franks评论的最后一次修改

var2[order(-sapply(var2, function(x) sum(var1 %in% x)))]
[[1]]
[1] "house"  "tree"   "forest"
[[2]]
[1] "tree"  "house" "star" 
[[3]]
[1] "apple"  "orange" "grapes"