我有一个像这样的唯一匹配数据集。每行都与结果匹配。
date <- c('2017/12/01','2017/11/01','2017/10/01','2017/09/01','2017/08/01','2017/07/01','2017/06/01')
team1 <- c('A','B','B','C','D','A','B')
team1_score <- c(1,0,4,3,5,6,7)
team2 <- c('B','A','A','B','C','C','A')
team2_score <- c(0,1,5,4,6,9,10)
matches <- data.frame(date, team1, team1_score, team2, team2_score)
我想为第1组和第2组创建2个新列,表单。匹配的结果可以由哪个团队有更高的分数或平局确定。结果如下所示。所以表格将是最后2场比赛中team1的结果。例如,对于前3行,分别为1和2队的形式。有时候某个团队没有足够的2场比赛,所以NULL的结果就足够了。我想知道team1和team2进入比赛的形式。
在实际数据集中,不仅仅有4个独特的团队。我一直在思考,但无法想出创建这两个变量的好方法。
答案 0 :(得分:0)
这是我的解决方案:
library(tidyverse)
date <- as.Date(c('2017/12/01','2017/11/01','2017/10/01','2017/09/01','2017/08/01','2017/07/01','2017/06/01', '2017/05/30'))
team1 <- c('A','B','B','C','D','A','B','A')
team1_score <- c(1,0,4,3,5,6,7,0)
team2 <- c('B','A','A','B','C','C','A','D')
team2_score <- c(0,1,5,4,6,9,10,0)
matches <- data.frame(date, team1, team1_score, team2, team2_score)
## 1. Create a unique identifier for each match. It assumes that teams can only play each other once a day.
matches$UID <- paste(matches$date, matches$team1, matches$team2, sep = "-")
## 2. Create a Score Difference Varaible reflecting team1's score
matches <- matches %>% mutate(score_dif_team1 = team1_score - team2_score)
## 3. Create a Result (WDL) reflecting team1's results
matches <- matches %>% mutate(results_team1 = if_else(score_dif_team1 < 0, true = "L", false = if_else(score_dif_team1 > 0, true = "W", false = "D")))
## 4. Cosmetic step: Reorder variables for easier comparison across variables
matches <- matches %>% select(UID, date:results_team1)
## 5. Reshape the table into a long format based on the teams. Each observation will now reflect the results of 1 team within a match. Each game will have two observations.
matches <- matches %>% gather(key = old_team_var, value = team, team1, team2)
## 6. Stablishes a common results variable for each observation. It essentially inverts the results_team1 varaible for teams2, and keeps results_team1 identical for teams1
matches <- matches %>%
mutate(results = if_else(old_team_var == "team2",
true = if_else(results_team1 == "W",
true = "L",
false = if_else(results_team1 == "L",
true = "W",
false = "D")),
false = results_team1))
## Final step: Filter the matches table by the dates you are interested into, and then reshapes the table to show a data frame of DLW in long format.
Results_table <- matches %>% filter(date <= as.Date("2017-12-01")) %>% group_by(team, results) %>% summarise(cases = n()) %>% spread(key = results, value = cases, fill = 0)
## Results:
# A tibble: 4 x 4
# Groups: team [4]
team D L W
* <chr> <dbl> <dbl> <dbl>
1 A 1 1 4
2 B 0 4 1
3 C 0 1 2
4 D 1 1 0