我有一个看起来像这样的数据集:
> df
teams people entries
1 A Team 6fd1 49
2 A Team 1df5 4
3 A Team 2hgt 19
4 A Team 8akt 4
5 A Team sdf9 19
6 B Team asc1 42
7 B Team abm8 32
8 B Team plo9 38
9 B Team 90la 5
10 B Team 8uil 23
> dput(df)
structure(list(teams = c("A Team", "A Team", "A Team", "A Team",
"A Team", "B Team", "B Team", "B Team", "B Team", "B Team"),
people = c("6fd1", "1df5", "2hgt", "8akt", "sdf9", "asc1",
"abm8", "plo9", "90la", "8uil"), entries = c(49, 4, 19, 4,
19, 42, 32, 38, 5, 23)), .Names = c("teams", "people", "entries"
), row.names = c(NA, -10L), class = "data.frame")
通过这样做,我可以使一部分拥有75%以上的球队,尽管这很混乱,而且可能不是最好的方法:
# sorted df and added cumulative percentage/sum and row number per team
> df
teams people entries cumulative_sum cumulative_perc number
1 A Team 6fd1 49 49 51.57895 1
3 A Team 2hgt 19 68 71.57895 2
5 A Team sdf9 19 87 91.57895 3
2 A Team 1df5 4 91 95.78947 4
4 A Team 8akt 4 95 100.00000 5
7 B Team abm8 89 89 45.17766 1
6 B Team asc1 42 131 66.49746 2
8 B Team plo9 38 169 85.78680 3
10 B Team 8uil 23 192 97.46193 4
9 B Team 90la 5 197 100.00000 5
# from this view, each team has 3/5 people (60%) reaching the minimum 75%
# entries, and using ddply, we can get that
ddply(df, 'teams', summarise,
marker = min(which(cumulative_perc > 75)),
total = NROW(teams),
seventyfive = marker/total)
teams marker total seventyfive
1 A Team 3 5 0.6
2 B Team 3 5 0.6
尽管可行,但我只想考虑第三人称入围百分比,实际上是团队入围的75%。例如,对于一个团队,其参赛作品的75%是72(向上舍入),这意味着我们只查看第三人的19个参赛作品中的4个,给那个团队2.21 / 5而不是3/5。
答案 0 :(得分:1)
WHERE
df %>% group_by(teams) %>%
summarise(seventyfive = {
tmp1 <- ceiling(0.75 * sum(entries)); tmp2 <- sum(cumsum(entries) < tmp1)
tmp2 + (tmp1 - sum(entries[1:tmp2])) / entries[tmp2 + 1]
})
# A tibble: 2 x 2
# teams seventyfive
# <chr> <dbl>
# 1 A Team 2.21
# 2 B Team 2.78
是条目的75%,而tmp1
是仍使累计百分比低于75%的最大条目数。然后,最后一行直接计算所需的数量。
答案 1 :(得分:1)
lead()
为您提供当前组中下一行的变量。
下面的方法对一行进行过滤,该行是下一个变量的条目与最小条目数量的分数(0-1)。
df %>%
group_by(teams) %>%
arrange(teams, -entries) %>%
mutate(delta = (ceiling(0.75 * sum(entries)) - cumsum(entries)) / lead(entries),
marker = row_number() + delta) %>%
filter(delta >= 0 & delta <= 1) %>%
select(teams, marker)
# A tibble: 2 x 2
# Groups: teams [2]
teams marker
<chr> <dbl>
1 A Team 2.21
2 B Team 2.78