所以有人可以阐明为什么我得到以下内容的“ NaN”:
这是我的数据框:
df2 = pd.DataFrame({'A' : ['foo', 'bar', 'foo', 'bar',
'foo', 'bar', 'foo', 'foo', 'jack'],
'B' : ['one', 'one', 'two', 'three',
'two', 'two', 'one', 'three', 'four']})
然后我按“ A”列分组
df3 = df2.groupby('A')
for A, group in df3:
print (A)
print (group)
结果:
bar
A B
1 bar one
3 bar three
5 bar two
foo
A B
0 foo one
2 foo two
4 foo two
6 foo one
7 foo three
jack
A B
8 jack four
到目前为止一切都很好,所以我要返回的是分组的集合,其中列“ B”包含“一个”或“两个”:
df4 = df3.apply (lambda x: (x[x['B'] == 'one']) | (x[x['B'] == 'two']))
我得到的结果是:
A B
A
bar 1 NaN NaN
5 NaN NaN
foo 0 NaN NaN
2 NaN NaN
4 NaN NaN
答案 0 :(得分:1)
为什么不事先过滤掉?
pd.concat({k : g for k, g in df2[df2.B.isin(['one', 'two'])].groupby('A')})
A B
bar 1 bar one
5 bar two
foo 0 foo one
2 foo two
4 foo two
6 foo one
如果您只是想获得单独的组而不将它们连接在一起,请在
停下来groups = {k : g for k, g in df2[df2.B.isin(['one', 'two'])].groupby('A')}
通过groups['bar']
或groups['foo']
访问每个组的地方。
答案 1 :(得分:0)
另一种方法是使用groupby
和apply
df2.groupby('A').apply(lambda x: x[x['B'].isin(['one','two'])])
A B
A
bar 1 bar one
5 bar two
foo 0 foo one
2 foo two
4 foo two
6 foo one