匹配R中相同数据集中的行

时间:2017-12-06 14:54:28

标签: r matching

我有一个像这样的唯一匹配数据集。每行都与结果匹配。

date <- c('2017/12/01','2017/11/01','2017/10/01','2017/09/01','2017/08/01','2017/07/01','2017/06/01')
team1 <- c('A','B','B','C','D','A','B')
team1_score <- c(1,0,4,3,5,6,7)
team2 <- c('B','A','A','B','C','C','A')
team2_score <- c(0,1,5,4,6,9,10)
matches <- data.frame(date, team1, team1_score, team2, team2_score)

我想为第1组和第2组创建2个新列,表单。匹配的结果可以由哪个团队有更高的分数或平局确定。结果如下所示。所以表格将是最后2场比赛中team1的结果。例如,对于前3行,分别为1和2队的形式。有时候某个团队没有足够的2场比赛,所以NULL的结果就足够了。我想知道team1和team2进入比赛的形式。

  • Form1:W-W,L-W,W-L
  • Form2:L-L,W-L,L-W

在实际数据集中,不仅仅有4个独特的团队。我一直在思考,但无法想出创建这两个变量的好方法。

1 个答案:

答案 0 :(得分:0)

这是我的解决方案:

    library(tidyverse)


    date <- as.Date(c('2017/12/01','2017/11/01','2017/10/01','2017/09/01','2017/08/01','2017/07/01','2017/06/01', '2017/05/30'))
    team1 <- c('A','B','B','C','D','A','B','A')
    team1_score <- c(1,0,4,3,5,6,7,0)
    team2 <- c('B','A','A','B','C','C','A','D')
    team2_score <- c(0,1,5,4,6,9,10,0)
    matches <- data.frame(date, team1, team1_score, team2, team2_score)

    ## 1. Create a unique identifier for each match. It assumes that teams can only play each other once a day.
    matches$UID <- paste(matches$date, matches$team1, matches$team2, sep = "-")

    ## 2. Create a Score Difference Varaible reflecting team1's score
    matches <- matches %>% mutate(score_dif_team1 = team1_score - team2_score)

    ## 3. Create a Result (WDL) reflecting team1's results
    matches <- matches %>% mutate(results_team1 = if_else(score_dif_team1 < 0, true = "L", false = if_else(score_dif_team1 > 0, true = "W", false = "D")))

    ## 4. Cosmetic step: Reorder variables for easier comparison across variables
    matches <- matches %>% select(UID, date:results_team1)

    ## 5. Reshape the table into a long format based on the teams. Each observation will now reflect the results of 1 team within a match. Each game will have two observations.
    matches <- matches %>% gather(key = old_team_var, value = team, team1, team2)

    ## 6. Stablishes a common results variable for each observation.  It essentially inverts the results_team1 varaible for teams2, and keeps results_team1 identical for teams1
    matches <- matches %>% 
                mutate(results = if_else(old_team_var == "team2", 
                                                    true = if_else(results_team1 == "W", 
                                                                   true = "L", 
                                                                   false = if_else(results_team1 == "L", 
                                                                                     true = "W",
                                                                                     false = "D")),
                                                    false = results_team1))

## Final step: Filter the matches table by the dates you are interested into, and then reshapes the table to show a data frame of DLW in long format.

    Results_table <- matches %>% filter(date <= as.Date("2017-12-01")) %>% group_by(team, results) %>% summarise(cases = n()) %>% spread(key = results, value = cases, fill = 0)

## Results:
    # A tibble: 4 x 4
    # Groups:   team [4]
       team     D     L     W
    * <chr> <dbl> <dbl> <dbl>
    1     A     1     1     4
    2     B     0     4     1
    3     C     0     1     2
    4     D     1     1     0