从多个向量中找出共同元素,这些元素至少以百分比形式出现

时间:2017-01-05 14:27:10

标签: r

假设我有4个向量:

a <- c("Mark","Kate","Greg", "Mathew")
b <- c("Mark","Tobias","Mary", "Mathew", "Greg")
c <- c("Mary","Chuck","Igor", "Mathew", "Robin", "Tobias")
d <- c("Kate","Mark","Igor", "Greg", "Robin", "Mathew")

我想从这些向量中选择重叠名称,并假设名称必须出现在这4个向量中的至少3个中。当然,我希望能够轻松地使用名称必须存在的向量百分比。

我能以某种方式修改intersect吗?

1 个答案:

答案 0 :(得分:7)

我认为这会奏效。我们使用table函数来完成大部分繁重工作。

find_perc <- function(..., perc = .75){
    list_len <- length(list(...)) # how many vectors
    tab_it <- table(c(...)) # tabulate all the names
    tab_it_perc <- tab_it / list_len # calculate the frequencies
    names(tab_it_perc[tab_it_perc >= perc]) # return those with freq >= perc
}


> find_perc(a, b, c, d)
[1] "Greg"   "Mark"   "Mathew"
> find_perc(a, b, c, d, perc = .5)
[1] "Greg"   "Igor"   "Kate"   "Mark"   "Mary"   "Mathew" "Robin"  "Tobias"