我正在尝试为每个球员计算进球总数,主要助攻和次要助攻的总数。我的问题是我无法理解这样做的逻辑,因为我想按(球员姓名)概括的数据被列在三个变量(目标,主要助攻和辅助助攻)中
这是我的可复制数据(来自dput()
,为造成的混乱表示歉意。)
mydata <- structure(list(primary_assist = c("Dmitry Gilyazitdinov", "Evgeny Orlov",
"Anton Burdasov", "Sergei Kalinin", "Stanislav Solovyov", "Vasily Streltsov",
NA, "Bogdan Potekhin", "Bogdan Potekhin", "Vasily Streltsov",
"Vasily Streltsov", "Viktor Postnikov", "Danil Kaskov", NA, NA,
"Artemy Panarin"), secondary_assist = c("Andrei Badrutdinov",
NA, NA, NA, "Danil Gubarev", "Nikita Manukhov", NA, "Evgeny Grigorenko",
"Daniil Apalkov", "Ivan Boiko", NA, "Viktor Antipin", "Vitaly Sychov",
NA, NA, "Stanislav Levin"), goal = c("Vitaly Kropachyov", "Dmitry Kozlov",
"Stanislav Solovyov", "Kirill Polyansky", "Anton Burdasov", "Ilya Solodov",
"Alexander Antropov", "Daniil Apalkov", "Evgeny Grigorenko",
"Alexander Antropov", "Alexander Antropov", "Evgeny Grigorenko",
"Denis Belonogov", "Vitaly Sychov", "Alexander Streltsov", "Pyotr Kopyttsov"
), team = c("Belye Medvedi", "Omskie Yastreby", "Belye Medvedi",
"Omskie Yastreby", "Belye Medvedi", "Avto", "Avto", "Stalnye Lisy",
"Stalnye Lisy", "Avto", "Avto", "Stalnye Lisy", "Avto", "Avto",
"Avto", "Russkie Vityazi"), game_strength = c("PP", "EV", "EV",
"EV", "EV", "PP", "SO", "EV", "PP", "PP", "EV", "PP", "PP", "EV",
"PP", "EV"), season = c("2009-10", "2009-10", "2009-10", "2009-10",
"2009-10", "2009-10", "2009-10", "2009-10", "2009-10", "2009-10",
"2009-10", "2009-10", "2009-10", "2009-10", "2009-10", "2009-10"
)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA,
-16L), .Names = c("primary_assist", "secondary_assist", "goal",
"team", "game_strength", "season"))
mydata
#> # A tibble: 16 x 6
#> primary_assist secondary_assist goal team game_strength season
#> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 Dmitry Gilyazitdinov Andrei Badrutdin~ Vita~ Bely~ PP 2009-~
#> 2 Evgeny Orlov <NA> Dmit~ Omsk~ EV 2009-~
#> 3 Anton Burdasov <NA> Stan~ Bely~ EV 2009-~
#> 4 Sergei Kalinin <NA> Kiri~ Omsk~ EV 2009-~
#> 5 Stanislav Solovyov Danil Gubarev Anto~ Bely~ EV 2009-~
#> 6 Vasily Streltsov Nikita Manukhov Ilya~ Avto PP 2009-~
#> 7 <NA> <NA> Alex~ Avto SO 2009-~
#> 8 Bogdan Potekhin Evgeny Grigorenko Dani~ Stal~ EV 2009-~
#> 9 Bogdan Potekhin Daniil Apalkov Evge~ Stal~ PP 2009-~
#> 10 Vasily Streltsov Ivan Boiko Alex~ Avto PP 2009-~
#> 11 Vasily Streltsov <NA> Alex~ Avto EV 2009-~
#> 12 Viktor Postnikov Viktor Antipin Evge~ Stal~ PP 2009-~
#> 13 Danil Kaskov Vitaly Sychov Deni~ Avto PP 2009-~
#> 14 <NA> <NA> Vita~ Avto EV 2009-~
#> 15 <NA> <NA> Alex~ Avto PP 2009-~
#> 16 Artemy Panarin Stanislav Levin Pyot~ Russ~ EV 2009-~
所以,我想计算每个球员的进球数,主要助攻和次要助攻,然后为每位球员分配1行。假设“ Artemy Panarin”这个名字在目标中列出1次,在主助中0次,在辅助中2次,我的输出看起来像这样:
tibble::tibble(name = c("Artemy Panarin", "Stanislav Levin", "Danil Kaskov"), team = c("Russkie Vityazi", "Russkie Vityazi", "Avto"), goals = c(1, 1, 0), primary_assists = c(0, 0, 1), secondary_assists = c(2, 0, 0))
#> # A tibble: 3 x 5
#> name team goals primary_assists secondary_assists
#> <chr> <chr> <dbl> <dbl> <dbl>
#> 1 Artemy Panarin Russkie Vityazi 1.00 0 2.00
#> 2 Stanislav Levin Russkie Vityazi 1.00 0 0
#> 3 Danil Kaskov Avto 0 1.00 0
这有意义吗?有任何想法吗?最好使用Tidyverse解决方案。谢谢!
答案 0 :(得分:6)
我们可以gather
转换为“长”格式,按“名称”,“团队”和“键”列(来自gather
),summarise
分组然后spread
恢复为“宽”格式
library(tidyverse)
gather(mydata, key, name, primary_assist:goal) %>%
group_by(name, team, key) %>%
summarise(n = n()) %>%
spread(key, n, fill = 0)
# A tibble: 30 x 5
# Groups: name, team [30]
# name team goal primary_assist secondary_assist
# <chr> <chr> <dbl> <dbl> <dbl>
# 1 Alexander Antropov Avto 3 0 0
# 2 Alexander Streltsov Avto 1 0 0
# 3 Andrei Badrutdinov Belye Medvedi 0 0 1
# 4 Anton Burdasov Belye Medvedi 1 1 0
# 5 Artemy Panarin Russkie Vityazi 0 1 0
# 6 Bogdan Potekhin Stalnye Lisy 0 2 0
# 7 Daniil Apalkov Stalnye Lisy 1 0 1
# 8 Danil Gubarev Belye Medvedi 0 0 1
# 9 Danil Kaskov Avto 0 1 0
#10 Denis Belonogov Avto 1 0 0
# ... with 20 more rows
答案 1 :(得分:3)
获得结果的一种方法是除汇总策略外,还使用gather()/spread()
重塑数据。
library(tidyverse)
scoring_summary <- mydata %>%
select(primary_assist:team) %>%
gather("key", "player", -team) %>%
group_by(player) %>%
count(key) %>%
spread(key, n)
# convert NAs to 0
scoring_summary[is.na(scoring_summary)] <- 0
scoring_summary
# A tibble: 28 x 4
# Groups: player [28]
player goal primary_assist secondary_assist
<chr> <dbl> <dbl> <dbl>
1 Alexander Antropov 3 0 0
2 Alexander Streltsov 1 0 0
3 Andrei Badrutdinov 0 0 1
4 Anton Burdasov 1 1 0
5 Artemy Panarin 0 1 0
6 Bogdan Potekhin 0 2 0
7 Daniil Apalkov 1 0 1
8 Danil Gubarev 0 0 1
9 Danil Kaskov 0 1 0
10 Denis Belonogov 1 0 0
count()
与您最初尝试summarise(count(goals)
答案 2 :(得分:2)
您可以使用收集和传播。首先收集目标并将辅助列归为“关键点”,然后按关键点和玩家分组。您可以稍后将NA转换为0s
library(tidyverse)
mydata_tidy <- mydata %>%
gather(key = "key", value = "player", primary_assist, secondary_assist, goal) %>%
na.omit()
mydata_tidy %>%
group_by(key, player) %>%
summarize(count = n()) %>%
spread(key, count) %>%
filter(player %in% c("Artemy Panarin", "Stanislav Levin", "Danil Kaskov"))
#> # A tibble: 3 x 4
#> player goal primary_assist secondary_assist
#> <chr> <int> <int> <int>
#> 1 Artemy Panarin NA 1 NA
#> 2 Danil Kaskov NA 1 NA
#> 3 Stanislav Levin NA NA 1
由reprex package(v0.2.0)于2018-07-18创建。