熊猫:保持dict中的数据帧不包含某些值

时间:2018-01-02 13:56:40

标签: python pandas dictionary

我有一个名为dfs的字典,其中包含数据帧:

    team_id  player_id     x_loc     y_loc   radius  game_clock  shot_clock  \
1        -1         -1  27.91690  41.37191  4.18103      710.78       11.71   
2        -1         -1  31.90677  36.18951  3.47588      710.30       11.44   
3        -1         -1  34.13352  27.62760  1.17149      709.82       11.16   
4        -1         -1  34.74723  23.90685  3.42091      709.34       10.88   
5        -1         -1  24.68878  15.18316  5.02066      708.86       10.60   
6        -1         -1  17.59483   9.16468  3.03803      708.38       10.32   
7        -1         -1  18.69309  12.53733  2.22372      707.90       10.04   
8        -1         -1  16.23927  17.82597  5.45565      707.42        9.77   
9        -1         -1   9.84219   8.62434  8.59493      706.94        9.49   
10       -1         -1   5.73599   3.83553  4.77459      706.46        9.21   
11       -1         -1   5.49103   3.97060  4.82267      705.98        8.93   
12       -1         -1   2.44574   3.85045  0.84340      705.50        8.65   
13       -1         -1  30.44487  43.11858  7.48128      713.02       13.01   

    quarter   game_id  event_id   GAME_ID  EVENTMSGTYPE  PLAYER1_TEAM_ID  
1         1  21500492         1  21500492           NaN     1.610613e+09  
2         1  21500492         1  21500492           NaN     1.610613e+09  
3         1  21500492         1  21500492           NaN     1.610613e+09  
4         1  21500492         1  21500492           NaN     1.610613e+09  
5         1  21500492         1  21500492           NaN     1.610613e+09  
6         1  21500492         1  21500492           NaN     1.610613e+09  
7         1  21500492         1  21500492           NaN     1.610613e+09  
8         1  21500492         1  21500492           NaN     1.610613e+09  
9         1  21500492         1  21500492           NaN     1.610613e+09  
10        1  21500492         1  21500492           NaN     1.610613e+09  
11        1  21500492         1  21500492           NaN     1.610613e+09  
12        1  21500492         1  21500492           NaN     1.610613e+09  
13        1  21500492         2  21500492           2.0     1.610613e+09  

我想在EVENTMSGTYPE列中找到不包含值[3,5,6,7,8,9,10,11,12,13]的那些并将它们存储在新词典中,但是我似乎无法找到办法。

1 个答案:

答案 0 :(得分:1)

我认为您需要字典理解并使用boolean indexingisin进行过滤,~用于反转布尔值掩码:

vals = [3, 5, 6, 7, 8, 9, 10, 11, 12, 13] 

d1 = {k:df[~df['EVENTMSGTYPE'].isin(vals)] for k, df in dfs.items()}

或使用query

d1 = {k:df.query('EVENTMSGTYPE not in @vals') for k, df in dfs.items()}

要过滤掉空数据框,请使用:

d1 = {k:df[~df['EVENTMSGTYPE'].isin(vals)] for k, df in dfs.items()
        if not df['EVENTMSGTYPE'].isin(vals).all()}

编辑:

d1 = {}
last = 0
for k,df in dfs.items():
    m = ~df['EVENTMSGTYPE'].isin(vals)
    m = m & m.all()
    if m.all():
        d1[last] = df
        last += 1