我有一个拥有英超联赛得分的df,还有一个拥有整个赛季赛程的df。我希望能够将df各队的评分附加到时间表中,以便得出每场比赛的概率。下一步将是模拟整个季节。
我试图编写一个if语句来将df_1的字符串与df_2匹配,但是我不认为自己走在正确的道路上。
我确信这对大多数人来说都是低级编码,我感谢您的帮助。在来到这里之前,我曾尝试过。我真心的谢谢你。
vec_1 <- c("team_a", "team_b", "team_c")
vec_2 <- c(1.7, 1.2, 0.8)
vec_3 <- c("team_d", "team_e", "team_f")
vec_4 <- c(0.3, 0.5, 0.4)
# df_1 ratings df
df_1 <- data_frame(team = vec_1, rating = vec_2)
team rating
<chr> <dbl>
1 team_a 1.7
2 team_b 1.2
3 team_c 0.8
# df_2 schedule df
df_2 <- data_frame(home_tm = vec_1, away_tm = vec_3)
home_tm away_tm
<chr> <chr>
1 team_a team_d
2 team_b team_e
3 team_c team_f
所需结果:
home_tm away_tm home_tm_rat away_tm_rat
<chr> <chr> <dbl> <dbl>
1 team_a team_d 1.7 0.3
2 team_b team_e 1.2 0.5
3 team_c team_f 0.8 0.4
......
......
......
答案 0 :(得分:1)
如上所述,可以从join
中检查dplyr
:
df_2 %>%
left_join(df_1, by= c('home_tm' = 'team')) %>%
rename(home_tm_rat = rating) %>%
left_join(df_1, by = c('away_tm' = 'team')) %>%
rename(away_tm_rat = rating)
# A tibble: 3 x 4
home_tm away_tm home_tm_rat away_tm_rat
<chr> <chr> <dbl> <dbl>
1 team_a team_d 1.7 0.3
2 team_b team_e 1.2 0.5
3 team_c team_f 0.8 0.4
答案 1 :(得分:0)
类似于@liuminzhao,但我也建议您稍微考虑一下您的数据结构。如果您将df_2中的所有团队都放在一个列中,并用一个单独的列指示谁是主场/不在场,事情将会变得更容易。进一步了解tidy data here
library(tidyverse)
df_2 %>%
#gather the two columns of teams into a single column, using another column to indicate home/away
gather(key = HomeAway, value = team) %>%
#join the team ratings
left_join(df_1, by = c("team" = "team"))
# A tibble: 6 x 3
HomeAway team rating
<chr> <chr> <dbl>
1 home_tm team_a 1.7
2 home_tm team_b 1.2
3 home_tm team_c 0.8
4 away_tm team_d NA
5 away_tm team_e NA
6 away_tm team_f NA