我在Pandas中有以下数据框
%*
如果至少有一个匹配的数字为0,我想保留所有行。 结果将是:
letter number
------ -------
a 2
a 0
b 1
b 5
b 2
c 1
c 0
c 2
因为b没有匹配的数字是0
最好的方法是什么? 谢谢!
答案 0 :(得分:6)
您需要filtration:
df = df.groupby('letter').filter(lambda x: (x['number'] == 0).any())
print (df)
letter number
0 a 2
1 a 0
5 c 1
6 c 0
7 c 2
transform
的另一个解决方案,获取0
行的大小并按boolean indexing
过滤:
print (df.groupby('letter')['number'].transform(lambda x: (x == 0).sum()))
0 1
1 1
2 0
3 0
4 0
5 1
6 1
7 1
Name: number, dtype: int64
df = df[df.groupby('letter')['number'].transform(lambda x: (x == 0).sum()) > 0]
print (df)
letter number
0 a 2
1 a 0
5 c 1
6 c 0
7 c 2
编辑:
df1 = df[df['letter'].isin(df.loc[df['number'] == 0, 'letter'])]
print (df1)
letter number
0 a 2
1 a 0
5 c 1
6 c 0
7 c 2
与其他解决方案相比:
In [412]: %timeit df[df['letter'].isin(df[df['number'] == 0]['letter'])]
1000 loops, best of 3: 815 µs per loop
In [413]: %timeit df[df['letter'].isin(df.loc[df['number'] == 0, 'letter'])]
1000 loops, best of 3: 657 µs per loop
答案 1 :(得分:3)
您也可以在没有groupby
的情况下执行此操作,然后确定使用isin
保留哪些字母。我认为这个人有点整洁:
>>> letters_to_keep = df[df['number'] == 0]['letter']
>>> df_reduced = df[df['letter'].isin(letters_to_keep)]
>>> df_reduced
letter number
0 a 2
1 a 0
5 c 1
6 c 0
7 c 2
我怀疑这会比执行groupby
更快,但这可能与此无关!一个简单的timeit
表明情况就是这样:
>>> %%timeit
... df.groupby('letter').filter(lambda x: (x['number'] == 0).any())
100 loops, best of 3: 2.26 ms per loop
>>> %%timeit
... df[df['letter'].isin(df[df['number'] == 0]['letter'])]
1000 loops, best of 3: 820 µs per loop