使用dplyr

时间:2019-12-25 14:56:33

标签: r dplyr mutate

这是我第一次在这里发布问题,请保持温柔:)
我有一个数据框,其中包含来自英式足球/英超联赛(高级联赛)的进球数和角球统计数据,每行一场。 负责人(总理)会给你这样的东西(组成数据):

| Home          | Home_goals    | Away          | Away_goals    | Home_Corners  | Away_Corners  |
|------------   |------------   |-----------    |------------   |-------------- |-------------- |
| Tottenham     | 1             | Arsenal       | 0             | 5             | 2             |
| Man United    | 2             | Watford       | 1             | 7             | 4             |
| Man City      | 3             | West Ham      | 0             | 10            | 2             |
| Chelsea       | 2             | Arsenal       | 1             | 7             | 6             |
| Tottenham     | 4             | Norwich       | 1             | 6             | 0             |
| Man United    | 2             | Liverpool     | 2             | 4             | 7             |
| Tottenham     | 0             | Man City      | 2             | 3             | 8             |

我想为Home栏中的每个条目(在本例中为托特纳姆热刺)中找到下两个匹配的条目(第5行和第7行),并将它们粘贴到第1行的新列中。 我想对数据框中的每一行执行此操作,并保留所有行。我只想将接下来两场比赛的统计信息添加为新列:
首页_2
Home_goals_2
Away_2,依此类推。

老实说,我什至不知道该如何在Google上进行搜索,就我在stackoverflow方面的经验来看,我相信你们中的某些人会在几分钟之内解决这个问题:) 非常感谢您的帮助。

在此先多谢了
菲利普

编辑:

我真的不知道我是否可以在这里附加东西,但是数据框是这样的:

premierleague <- data.frame("Home" = c("Tottenham", "ManUnited", "ManCity", "Chelsea", "Tottenham", "ManUnited", "Tottenham"), 
                            "Home_goals" = c(1,2,3,2,4,2,0), 
                            "Away" = c ("Arsenal", "Watford", "Westham", "Arsenal", "Norwich", "Liverpool", "ManCity"), 
                            "Away_goals" = c(0,1,0,1,1,2,2), 
                            "Home_corners" = c(5,7,10,7,6,4,3), 
                            "Away_corners" = c(2,4,2,6,0,7,8))

### The desired result looks like this

premierleague_new <- data.frame(
  "Home" = c("Tottenham", "ManUnited", "ManCity", "Chelsea", "Tottenham", "ManUnited", "Tottenham"), 
  "Home_goals" = c(1,2,3,2,4,2,0), 
  "Away" = c("Arsenal", "Watford", "Westham", "Arsenal", "Norwich", "Liverpool", "ManCity"), 
  "Away_goals" = c(0,1,0,1,1,2,2), 
  "Home_corners" = c(5,7,10,7,6,4,3), 
  "Away_corners" = c(2,4,2,6,0,7,8),
  "Home_goals_2" = c(4,2,NA,NA,0,NA,NA),
  "Away_2" = c("Norwich", "Liverpool",NA,NA,"ManCity",NA,NA),
  "Away_goal_2" = c(1,2,NA,NA,2,NA,NA),
  "Home_corn_2" = c(6,4,NA,NA,3,NA,NA),
  "Away_corn_2" = c(0,7,NA,NA,8,NA,NA),
  "Home_goal_3" = c(0,NA,NA,NA,NA,NA,NA),
  "Away_3" = c("ManCity",NA,NA,NA,NA,NA,NA),
  "Away_goal_3" = c(2,NA,NA,NA,NA,NA,NA),
  "Home_corners_3" = c(3,NA,NA,NA,NA,NA,NA),
  "Away_corners_3" = c(8,NA,NA,NA,NA,NA,NA)
                                 )

托特纳姆热刺是唯一一支在全部3场比赛中都入选的球队,因此托特纳姆热刺的所有列均已填入第一行。

在第5行中,热刺的第二项仅具有第二场比赛的值,因为在此示例中,以热刺为主队的只有第二项。

我希望现在更加清楚。应该至少是可复制的。

1 个答案:

答案 0 :(得分:1)

我们可以group_by Home并使用lead从下一行获取值。

library(dplyr)

premierleague %>%
  group_by(Home) %>%
  mutate_at(vars(Home_goals:Away_corners), list(`2` = ~lead(.), `3` = ~lead(., 2)))


#  Home  Home_goals Away  Away_goals Home_corners Away_corners Home_goals_2 Away_2
#  <fct>      <dbl> <fct>      <dbl>        <dbl>        <dbl>        <dbl> <fct> 
#1 Tott…          1 Arse…          0            5            2            4 Norwi…
#2 ManU…          2 Watf…          1            7            4            2 Liver…
#3 ManC…          3 West…          0           10            2           NA NA    
#4 Chel…          2 Arse…          1            7            6           NA NA    
#5 Tott…          4 Norw…          1            6            0            0 ManCi…
#6 ManU…          2 Live…          2            4            7           NA NA    
#7 Tott…          0 ManC…          2            3            8           NA NA    
# … with 8 more variables: Away_goals_2 <dbl>, Home_corners_2 <dbl>,
#   Away_corners_2 <dbl>, Home_goals_3 <dbl>, Away_3 <fct>, Away_goals_3 <dbl>,
#   Home_corners_3 <dbl>, Away_corners_3 <dbl>