Question

我想基于只有groupby中的第一行所经历的条件对pandas数据帧进行子集化。

数据框将按＆＃34; name＆＃34;，＆＃34; driverRef＆＃34;，＆＃34; tire＆＃34;，＆＃34; stint＆＃34;

进行分组

例如，在下面的df中，因为alonso在第12位开始了他的第2场比赛，我想从df中删除所有alonso的记录。

    name                   driverRef stint  tyre      lap   pos     
0   Australian Grand Prix   alonso  1.0     Super soft  1   9        
1   Australian Grand Prix   alonso  1.0     Super soft  2   9        
2   Australian Grand Prix   alonso  1.0     Super soft  3   9       
3   Australian Grand Prix   alonso  2.0     Super soft  20   12        
4   Australian Grand Prix   alonso  2.0     Super soft  21   11     
5   Australian Grand Prix   alonso  2.0     Super soft  22   10

预期产出：

    name                   driverRef stint  tyre      lap   pos     
0   Australian Grand Prix   alonso  1.0     Super soft  2   9        
1   Australian Grand Prix   alonso  1.0     Super soft  3   9        
2   Australian Grand Prix   alonso  1.0     Super soft  4   9

我试过这个，但它没有正确地实现这个效果：

df.loc[df.groupby(['name', 'driverRef', 'tyre', 'stint']).first().reset_index()['position'].isin(list(range(1,11))).index]

修改：我的代码确实有效，但请参阅@ jezrael的答案，了解更多的抄袭/更好的写作方式。

Answer 1

你真的很接近，需要transform返回系列，其长度与原始df相同：

s = df.groupby(['name', 'driverRef', 'tyre', 'stint'])['pos'].transform('first')
print (s)
0     9
1     9
2     9
3    12
4    12
5    12
Name: pos, dtype: int64

df = df[s.isin(list(range(1,11)))]
print (df)
                    name driverRef  stint        tyre  lap  pos
0  Australian Grand Prix    alonso    1.0  Super soft    1    9
1  Australian Grand Prix    alonso    1.0  Super soft    2    9
2  Australian Grand Prix    alonso    1.0  Super soft    3    9

Python2.7：基于groupby第一行条件的子集数据帧

1 个答案: