按列

时间:2017-07-11 01:34:09

标签: r dataframe tidyr

我有一个NBA球员统计数据框,从basketball-reference.com中删除,如下所示:

Player       | Pos  | Team  | Games | Min   | Points

Alex Abrines |  SG  | OKC   | 68    | 15.5  | 6.0  
Quincy Acy   |  PF  | TOT   | 38    | 14.7  | 5.8  
Quincy Acy   | PF   | DAL   | 6     | 8.0   | 2.2  
Quincy Acy   | PF   | BRK   | 32    | 15.9  | 6.5  
Steven Adams |  C   | OKC   | 80    | 29.9  | 11.3  
Arron Afflalo| SG   | SAC   | 61    | 25.9  | 8.4  

对于为同一支球队(如Abrines,Adams和Afflalo)打过整个赛季的球员,他们只出现过一次。但是如果一名球员已经为一支以上的球队(比如Quincy Acy)效力,那么数据框就会为他所参加的每支球队包含一排,然后是另一支球员#34; TOT" (总)行。

我希望找回一个每个播放器只有1个唯一行的数据帧,并且该行是" TOT"行和要删除的其他行。有点难过。

最明智的做法是通过排除" TOT"在Team专栏中,但对于其中一个拥有一个玩家的玩家的总行数总是如此,那就是游戏价值将高于该玩家其他行中的游戏价值。

1 个答案:

答案 0 :(得分:1)

我们可以按filter

进行分组
library(dplyr)
df1 %>%
    group_by(Player, Pos) %>%
    filter(Team == "TOT" | n()==1)
# A tibble: 4 x 6
# Groups:   Player, Pos [4]
#        Player   Pos  Team Games   Min Points
#         <chr> <chr> <chr> <int> <dbl>  <dbl>
#1  Alex Abrines    SG   OKC    68  15.5    6.0
#2    Quincy Acy    PF   TOT    38  14.7    5.8
#3  Steven Adams     C   OKC    80  29.9   11.3
#4 Arron Afflalo    SG   SAC    61  25.9    8.4

data.table的类似方法是

library(data.table)
setDT(df1)[, .SD[Team=="TOT"|.N==1], .(Player, Pos)]

数据

df1 <- structure(list(Player = c("Alex Abrines", "Quincy Acy", "Quincy Acy", 
"Quincy Acy", "Steven Adams", "Arron Afflalo"), Pos = c("SG", 
"PF", "PF", "PF", "C", "SG"), Team = c("OKC", "TOT", "DAL", "BRK", 
"OKC", "SAC"), Games = c(68L, 38L, 6L, 32L, 80L, 61L), Min = c(15.5, 
14.7, 8, 15.9, 29.9, 25.9), Points = c(6, 5.8, 2.2, 6.5, 11.3, 
8.4)), .Names = c("Player", "Pos", "Team", "Games", "Min", "Points"
), class = "data.frame", row.names = c(NA, -6L))