这是关于参加奥运会的运动员。 我应该计算出最长的获得奖牌的前十名运动员。
例如:在2004年,2008年,2012年赢得了->因此,运动员连续赢得了3次。
我正在学习有关R的信息,对此我迷失了方向。
我什至不知道从哪里开始解决这个问题。
尽可能“清除”我的数据: -只有获得金牌的运动员 -获得他们所赢得的实际年份
我的色谱柱(清洗后)
id name team year medal
1 john doe USA 2004 gold
1 john doe USA 2008 gold
1 john doe USA 2012 gold
2 marc twain GER 2016 gold
3 edgar poe FIN 2000 gold
3 edgar poe FIN 2008 gold
我已经尝试过类似的事情:
mutate(won =
if_else(condition = year == year +4,
true = "won",
false = "lost"))
或类似的
mutate(won =
if_else(
condition = (year + 4) == tmp_year,
true = "Following Year",
false = if_else(
condition = year == tmp_year,
true = "Actual year",
false = "No")))
在这里,我只得到“实际年份”,没有“答案”。
最后,我想要一张桌子,该表格显示出ahelte连续赢得金牌的次数。
例如,数据集就是这样:
id name won
1 john doe 3
2 marc twain 1
3 edgar poe 1
编辑:我不是在寻找完整的答案,更像是灵感:看哪些功能可能很有趣。
答案 0 :(得分:1)
使用dplyr
,我们可以针对每个diff
使用name
,然后计算group_by
name
来计算金牌获胜年份的差,并计算出差连续的奖金。
library(dplyr)
df %>%
group_by(name) %>%
mutate(diff = c(4,diff(year))) %>%
group_by(name, diff) %>%
summarise(count = n()) %>%
select(-diff)
# name count
# <fct> <int>
#1 edgarpoe 1
#2 edgarpoe 1
#3 johndoe 3
#4 marctwain 1
答案 1 :(得分:1)
以下是使用cumsum
和dplyr::lead
的一个选项,默认情况下等于Year + 4(考虑到玩家可以拥有多个奖牌的情况)
library(dplyr)
df %>% group_by(id) %>%
mutate(flag=lead(year,default = last(year)+4)-year, won=cumsum(flag==4)) %>%
select(-flag) %>% slice(which.max(won))
# A tibble: 3 x 6
# Groups: id [3]
id name team year medal won
<int> <chr> <chr> <int> <chr> <int>
1 1 john doe USA 2012 gold 3
2 2 marc twain GER 2016 gold 1
3 3 edgar poe FIN 2008 gold 1
这可以通过紧凑的方式完成
df %>% group_by(id, name, team) %>%
mutate(yearlead = lead(year, default = year[n()]+4), yeardiff = yearlead - year) %>%
group_by( grp = rleid(case_when(yeardiff == 4 ~ as.integer(yeardiff), TRUE ~ row_number())), add = TRUE) %>%
summarise(n = n())
# A tibble: 4 x 5
# Groups: id, name, team [?]
id name team grp n
<int> <chr> <chr> <int> <int>
1 1 john doe USA 1 3
2 2 marc twain GER 1 1
3 3 edgar poe FIN 1 1
4 3 edgar poe FIN 2 1
数据(此数据与OP数据集不同)
df <- structure(list(id = c(1L, 1L, 1L, 2L, 3L, 3L, 3L, 3L, 3L), name = c("john doe", "john doe", "john doe", "marc twain", "edgar poe", "edgar poe", "edgar poe", "edgar poe", "edgar poe"),
team = c("USA", "USA", "USA", "GER", "FIN", "FIN", "FIN", "FIN", "FIN"), year = c(2004L, 2008L, 2012L, 2016L, 2000L, 2008L, 2016L, 2020L, 2024L), medal = c("gold", "gold", "gold", "gold", "gold", "gold", "gold", "gold", "gold" )), class = "data.frame", row.names = c(NA, -9L))