比较R中的两个以上的向量(投票)

时间:2017-05-06 23:51:00

标签: r

我有5个向量,这些向量中的每个项目都是“是”或“否” 所以我想比较这5个向量(逐行)并计算每行的多数投票,并将结果添加到新的向量。 如何以有效的方式执行此操作。

v1=c("yes","no","no","yes")
v2=c("no","no","yes","yes")
v3=c("yes","yes","no","yes")
v4=c("yes","no","yes","yes")
v5=c("yes","yes","yes","no")
#The expected output is "yes", "no", "yes", "yes"

2 个答案:

答案 0 :(得分:3)

首先将数据放在基于字符的表格中:

dat <- data.frame( v1=c("yes","no","no","yes"),
                  v2=c("no","no","yes","yes"),
                  v3=c("yes","yes","no","yes"),
                  v4=c("yes","no","yes","yes"),
                  v5=c("yes","yes","yes","no"), stringsAsFactors=FALSE)

然后拉出表对象的最大值名称:

 apply(dat, 1, function(x) names(which.max(table(x))) )
[1] "yes" "no"  "yes" "yes"

答案 1 :(得分:0)

另一种方法是使用带有mapply的{​​{1}}来返回TRUE和FALSE矩阵,比较向量的元素等于某个位置(此处为“是”)。然后==计算跨行的比例,rowMeans检查多数。我们添加1以转换为数字位置,然后将其用作从> 0.5中的元素中进行选择的位置。

c("no", "yes")

使用矩阵乘法的替代方法是

c("no", "yes")[(rowMeans(mapply("==", moreArgs=list("yes"), myList)) > 0.5) + 1L]
[1] "yes" "no"  "yes" "yes"

请注意,首先将矢量放在列表中,如下所示。

数据

c("no", "yes")[((do.call(cbind, myList) == "yes") %*%
               rep(1, length(myList)) > (length(myList) / 2)) + 1L]
[1] "yes" "no"  "yes" "yes"