假设我有4个向量:
a <- c("Mark","Kate","Greg", "Mathew")
b <- c("Mark","Tobias","Mary", "Mathew", "Greg")
c <- c("Mary","Chuck","Igor", "Mathew", "Robin", "Tobias")
d <- c("Kate","Mark","Igor", "Greg", "Robin", "Mathew")
我想从这些向量中选择重叠名称,并假设名称必须出现在这4个向量中的至少3个中。当然,我希望能够轻松地使用名称必须存在的向量百分比。
我能以某种方式修改intersect
吗?
答案 0 :(得分:7)
我认为这会奏效。我们使用table
函数来完成大部分繁重工作。
find_perc <- function(..., perc = .75){
list_len <- length(list(...)) # how many vectors
tab_it <- table(c(...)) # tabulate all the names
tab_it_perc <- tab_it / list_len # calculate the frequencies
names(tab_it_perc[tab_it_perc >= perc]) # return those with freq >= perc
}
> find_perc(a, b, c, d)
[1] "Greg" "Mark" "Mathew"
> find_perc(a, b, c, d, perc = .5)
[1] "Greg" "Igor" "Kate" "Mark" "Mary" "Mathew" "Robin" "Tobias"