我有一个NBA球员统计数据框,从basketball-reference.com中删除,如下所示:
Player | Pos | Team | Games | Min | Points
Alex Abrines | SG | OKC | 68 | 15.5 | 6.0
Quincy Acy | PF | TOT | 38 | 14.7 | 5.8
Quincy Acy | PF | DAL | 6 | 8.0 | 2.2
Quincy Acy | PF | BRK | 32 | 15.9 | 6.5
Steven Adams | C | OKC | 80 | 29.9 | 11.3
Arron Afflalo| SG | SAC | 61 | 25.9 | 8.4
对于为同一支球队(如Abrines,Adams和Afflalo)打过整个赛季的球员,他们只出现过一次。但是如果一名球员已经为一支以上的球队(比如Quincy Acy)效力,那么数据框就会为他所参加的每支球队包含一排,然后是另一支球员#34; TOT" (总)行。
我希望找回一个每个播放器只有1个唯一行的数据帧,并且该行是" TOT"行和要删除的其他行。有点难过。
最明智的做法是通过排除" TOT"在Team专栏中,但对于其中一个拥有一个玩家的玩家的总行数总是如此,那就是游戏价值将高于该玩家其他行中的游戏价值。
答案 0 :(得分:1)
我们可以按filter
library(dplyr)
df1 %>%
group_by(Player, Pos) %>%
filter(Team == "TOT" | n()==1)
# A tibble: 4 x 6
# Groups: Player, Pos [4]
# Player Pos Team Games Min Points
# <chr> <chr> <chr> <int> <dbl> <dbl>
#1 Alex Abrines SG OKC 68 15.5 6.0
#2 Quincy Acy PF TOT 38 14.7 5.8
#3 Steven Adams C OKC 80 29.9 11.3
#4 Arron Afflalo SG SAC 61 25.9 8.4
data.table
的类似方法是
library(data.table)
setDT(df1)[, .SD[Team=="TOT"|.N==1], .(Player, Pos)]
df1 <- structure(list(Player = c("Alex Abrines", "Quincy Acy", "Quincy Acy",
"Quincy Acy", "Steven Adams", "Arron Afflalo"), Pos = c("SG",
"PF", "PF", "PF", "C", "SG"), Team = c("OKC", "TOT", "DAL", "BRK",
"OKC", "SAC"), Games = c(68L, 38L, 6L, 32L, 80L, 61L), Min = c(15.5,
14.7, 8, 15.9, 29.9, 25.9), Points = c(6, 5.8, 2.2, 6.5, 11.3,
8.4)), .Names = c("Player", "Pos", "Team", "Games", "Min", "Points"
), class = "data.frame", row.names = c(NA, -6L))