如何在R

时间:2018-09-11 19:29:45

标签: r

year	event	athlete	time
2000	100m	Ato Boldon	9.95
2000	100m	Brian Lewis	10.02
2000	100m	Coby Miller	9.98
2000	100m	Francis Obikwelu	9.97
2000	100m	Jon Drummond	9.96
2000	100m	Maurice Greene	9.86
2000	100m	Michael Marsh	10.01
2000	100m	Obadele Thompson	9.97
2000	100m	Tony McCall	10.06
2001	100m	Ato Boldon	9.88
2001	100m	Aziz Zakari	10.04
2001	100m	Bernard Williams	9.96
2001	100m	Dwain Chambers	10
2001	100m	Josh Norman	10.17
2001	100m	Kim Collins	10.04
2001	100m	Leonard Scott	10.05
2001	100m	Mark Lewis-Francis	10.12
2001	100m	Maurice Greene	9.9
2002	100m	Bernard Williams	9.99
2002	100m	Chris Williams	10.13
2002	100m	Francis Obikwelu	10.01
2002	100m	J.J. Johnson	9.95
2002	100m	Kim Collins	9.98
2002	100m	Marc Burns	10.18
2002	100m	Mark Lewis-Francis	10.04
2002	100m	Maurice Greene	9.89
2002	100m	Shingo Suetsugu	10.05
2002	100m	Taiwo Ajibade	10.18
2003	100m	Bernard Williams	10.04
2003	100m	Deji Aliu	9.95
2003	100m	Dwain Chambers	10.06
2003	100m	Hrist<f3>foros Ho<ed>dis	10.16
2003	100m	J.J. Johnson	10.05
2003	100m	John Capel	9.97
2003	100m	Justin Gatlin	9.97
2003	100m	Kim Collins	9.99
2003	100m	Maurice Greene	9.94
2004	100m	Asafa Powell	9.87
2004	100m	Ato Boldon	10.09
2004	100m	Christie van Wyk	10.09
2004	100m	Darrel Brown	10.11
2004	100m	Francis Obikwelu	10.02
2004	100m	Justin Gatlin	9.92
2004	100m	Maurice Greene	9.91
2004	100m	Mickey Grimes	10.12
2004	100m	Shawn Crawford	9.88
2005	100m	Asafa Powell	9.77
2005	100m	Aziz Zakari	9.99
2005	100m	Dwight Thomas	10
2005	100m	Francis Obikwelu	10.04
2005	100m	Justin Gatlin	9.89
2005	100m	Leonard Scott	9.94
2005	100m	Marc Burns	9.96
2005	100m	Maurice Greene	10.01
2005	100m	Shawn Crawford	9.99

我正在使用R中的数据集,该数据集包含四列:年,事件,运动员和分数。每一行都是对给定事件和年份内运动员得分的观察。

我想做的是创建一个新列,该列将显示每个运动员的历史最佳成绩,而最好的成绩则表示为他们的最低成绩。

在excel中,我将创建一个minifs公式,该公式将检查给定年份的分数是否小于前几年的分数,如果是,则将成为运动员有史以来的最佳分数,如果不是,则将打印出任何内容他们以前的最高分是。

很抱歉,是否曾经有人问过并回答过,但是我们将不胜感激。

2 个答案:

答案 0 :(得分:0)

Excel MINIFS函数返回在一系列值中满足一个或多个条件的最小数值。以下是简单R复制的示例:

# 1. Libraries
library(dplyr)

# 2. Data set
df <- data.frame(
  year = c(2000, 2000, 2000),
  athlete = c("Ato Boldon", "Brian Lewis", "Coby Miller"),
  event = c("100m", "100m", "200m"),
  score = c(9.95, 10.02, 9.98))

# 3. Replicate Excel 'MINIFS' function

# 3.1. One solution
df %>% 
  group_by(event) %>% 
  filter(score == min(score)) %>%
  ungroup()

# 3.2. Another solution
df %>% 
  group_by(event) %>% 
  mutate(min_score = ifelse(event == "200m", min(score), score)) %>%
  ungroup()

# 3.3. By 'athlete' for all time best score with 'year'
df_athlete_all_time <- df %>% 
  group_by(athlete) %>% 
  mutate(min_score_all_time = min(score)) %>% 
  subset(select = c("athlete", "min_score_all_time")) %>% 
  unique() %>% ungroup()

# 2.4. Merge with original data
df_merge <- left_join(df, df_athlete_all_time, by = c("athlete"))

# 2.5. What 'year' best score took place
df_merge %>% 
  filter(score == min_score_all_time)

 # 2.6. Compare it to all the athlete's previous years scores and print out the smaller of the two
 # Homework :)

答案 1 :(得分:0)

# example data
df = read.table(text = "
year    event   athlete time
2000    100m    AtoBoldon   9.95
2001    100m    AtoBoldon   10.02
2000    100m    CobyMiller  9.98
2003    100m    AtoBoldon   9.97
2001    100m    CobyMiller  9.96
2003    100m    CobyMiller  9.86
", header=T)

library(dplyr)

df %>%
  group_by(athlete, event) %>%  # for each event and ethlete
  mutate(best_time = min(time), # get minimum time
         year_best_time = year[time == best_time]) %>%  # get year of minimum time
  ungroup()

# # A tibble: 6 x 6
#    year event athlete     time best_time year_best_time
#   <int> <fct> <fct>      <dbl>     <dbl>          <int>
# 1  2000 100m  AtoBoldon   9.95      9.95           2000
# 2  2001 100m  AtoBoldon  10.0       9.95           2000
# 3  2000 100m  CobyMiller  9.98      9.86           2003
# 4  2003 100m  AtoBoldon   9.97      9.95           2000
# 5  2001 100m  CobyMiller  9.96      9.86           2003
# 6  2003 100m  CobyMiller  9.86      9.86           2003