Question

我有下表：

df = structure(list(test_id = c(1, 1, 1, 2, 2, 3, 3, 3, 3, 4, 4), person = structure(c(1L, 
2L, 3L, 2L, 3L, 1L, 2L, 3L, 4L, 1L, 3L), .Label = c("a", "b", 
"c", "d"), class = "factor"), points = c(1, 5, 2, 6, 5, 3, 4, 
5, 6, 2, 1)), .Names = c("test_id", "person", "points"), row.names = c(NA, 
-11L), class = "data.frame")

我想确定几个场景，并需要您在方案2＆amp; 3：

在所有测试中，哪个单身人士得分最高，例如使用dplyr： df %>% group_by(person) %>% summarize(most_points = sum(points)) %>% top_n(1,most_points)
在所有测试中，哪两个人合计得分最多（每个测试的两个人之间的最大分值）

预期输出：人a＆amp;人b将是最好的两个人组合，他们的积分之和为17（b胜出测试1,2＆amp; 3和胜出测试4）。
在所有测试中，哪三个人合计得分最多（每个测试的三个人之间的最大分值）

预期输出：人a，人b＆amp;人c将是最好的三人组合，他们的积分总和为19（人b胜1和2，人d胜3，人胜4）

这是一个非常简化的表格，实际上我有数十万行我将与更多人和test_id进行分析。请注意，并非每个人都有每个测试的分值。我对加权平均值不感兴趣，只是他们（或两个/三个的组合）积累的最高点。

Answer 1

到这里需要一段时间，但是你可以做些什么。我们首先要更改数据的格式：

library(tidyr)
dfs=spread(df,person,points)
dfs[is.na(dfs)]=0
pers=unique(df$person)

返回：

  test_id a b c d
1       1 1 5 2 0
2       2 0 6 5 0
3       3 3 4 5 6
4       4 2 0 1 0

然后，我们将使用combn查找两个人的所有组合，并确定每个测试的两者之间的最大值，并在所有测试中求和。有了它，我们可以识别具有最高总和的对：

cc2=combn(1:length(pers),2)
values2 = sapply(1:ncol(cc2),function(i) sum(apply(dfs[,cc2[,i]+1],1,function(x) max(x))))
names(values2) = apply(cc2,2,function(x) paste(pers[x],collapse="-"))
values2=values2[values2==max(values2)]

返回：

a-b b-c b-d 
 17  17  17

对于3人的组合，我们做同样的事情：

cc3=combn(1:length(pers),3)
values3 = sapply(1:ncol(cc3),function(i) sum(apply(dfs[,cc3[,i]+1],1,function(x) max(x))))
names(values3) = apply(cc3,2,function(x) paste(pers[x],collapse="-"))
values3=values3[values3==max(values3)]

返回：

a-b-d 
   19

Answer 2

排名前两位的人

df %>% 
  group_by(person) %>% 
  summarize(most_points = sum(points)) %>% 
  arrange(desc(most_points)) %>%
  top_n(2, most_points)

# A tibble: 2 × 2
  person most_points
  <fctr>       <dbl>
1      b          20
2      c           8

前三名（真正有四人打平）

df %>% 
  group_by(person) %>% 
  summarize(most_points = sum(points)) %>% 
  arrange(desc(most_points)) %>%
  top_n(3, most_points)

# A tibble: 4 × 2
  person most_points
  <fctr>       <dbl>
1      b          20
2      c           8
3      a           6
4      d           6

组合分析以最大化R

2 个答案: