Question

我想计算候选人在（随机生成的）选举中收到的第一，第二，第三等偏好的数量：

library(tidyverse)
library(magrittr)

set.seed(42)

results <- replicate(10, sample.int(5,5)) %>%
 t() %>%
 tbl_df() %>%
 set_colnames(c("A", "B", "C", "D", "E"))

# A tibble: 10 x 5
     A     B     C     D     E
   <int> <int> <int> <int> <int>
 1     5     4     1     2     3
 2     3     5     1     2     4
 3     3     5     4     1     2
 4     5     4     1     3     2
 5     5     1     3     2     4
 6     3     2     5     1     4
 7     4     5     2     3     1
 8     5     1     4     2     3
 9     2     5     1     4     3
10     5     4     2     3     1

我执行此操作的功能是：

count_prefs <- function(df, candidate, round) {
  df %>%
    filter_at(vars(candidate), all_vars(. == round)) %>%
    nrow()
}

我想要的输出是'n by m'表，其中n是候选数，m是回合数（在这种情况下，我意识到n = m，但我也想更一般地解决这个问题）。我尝试过：

map2_dbl(colnames(results), c(1:5), count_prefs, df = results)

但返回

[1] 0 1 1 1 0

仅是“ A 1”，“ B 2”，“ C 3”，“ D 4”，“ E 5”。

到目前为止，我的解决方案是使用cross2（）获取所有组合的列表并应用同一函数的调整版本：

count_prefs2 <- function(df, cand_round) {
  df %>%
    filter_at(vars(cand_round[[1]]), all_vars(. == cand_round[[2]])) %>%
    nrow()
}

map_int(cross2(colnames(results), c(1:5)), count_prefs2, df = results)

[1] 0 2 4 2 2 1 1 2 4 2 3 0 1 3 3 1 3 2 1 3 5 4 1 0 0

这给了我正确的数字，但随后我需要将其转换为矩阵，然后转换为数据框以获得所需的结果

map_int(cross2(colnames(results), c(1:5)), count_prefs2, df = results) %>%
  matrix(nrow = 5, ncol = 5, byrow = TRUE) %>%
  tbl_df() %>%
  set_colnames(c("A", "B", "C", "D", "E"))

# A tibble: 5 x 5
      A     B     C     D     E
  <int> <int> <int> <int> <int>
1     0     2     4     2     2
2     1     1     2     4     2
3     3     0     1     3     3
4     1     3     2     1     3
5     5     4     1     0     0

针对此问题是否有更优雅的解决方案？

Answer 1

如@markus所述，一个较短的基R选项正在使用stack

table(stack(df))

使用sapply的基本R方法是使用table根据一列可取的最大值（数据帧中的列号）指定levels来计算每一列的频率。

n <- ncol(df)
sapply(df, function(x) table(factor(x, levels = 1:n)))


#  A B C D E
#1 0 2 4 2 2
#2 1 1 2 4 2
#3 3 0 1 3 3
#4 1 3 2 1 3
#5 5 4 1 0 0

使用purrr我们可以做到

purrr::map_dfr(df,~table(factor(., levels = 1:n)))

# A tibble: 5 x 5
#      A     B     C     D     E
#  <int> <int> <int> <int> <int>
#1     0     2     4     2     2
#2     1     1     2     4     2
#3     3     0     1     3     3
#4     1     3     2     1     3
#5     5     4     1     0     0

如何迭代两个函数参数的所有组合并在r中返回'n by m'矩阵

1 个答案: