Question

鉴于我有以下pandas DataFrame：

arrays = [np.array(['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux']),
          np.array(['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two']),
          np.array([0.01, 0.2, 0.3, -0.5, 0.6, -0.7, -0.8, 0.9])]

tuples = list(zip(*arrays))
df_index = pd.MultiIndex.from_tuples(tuples, names=['A', 'B', 'measure'])

df = pd.DataFrame(np.random.randn(8, 4), index=df_index)
print(df)

如何过滤所有值，例如 measure 列（它是索引的一部分）大于0.2？

我试过了：

df.loc[:,:,0.1:0.9]

（以及其他变体，但我收到错误＆＃34; IndexingError：索引器太多＆＃34;

谢谢，杰拉德

Answer 1

In [3]: df.query("measure > 0.2")
Out[3]:
                        0         1         2         3
A   B   measure
baz one 0.3      0.623507  0.602585 -0.792142  2.066095
foo one 0.6      0.138192 -0.159108 -1.796944  1.668463
qux two 0.9     -0.162210 -2.293951  0.602990  1.622783

或

In [6]: df.loc[pd.IndexSlice[:,:,0.200001:], :]
Out[6]:
                        0         1         2         3
A   B   measure
baz one 0.3      0.623507  0.602585 -0.792142  2.066095
foo one 0.6      0.138192 -0.159108 -1.796944  1.668463
qux two 0.9     -0.162210 -2.293951  0.602990  1.622783

Answer 2

像get_level_values

这样的东西

df[df.index.get_level_values(2)>0.2]
Out[35]: 
                        0         1         2         3
A   B   measure                                        
baz one 0.3     -0.235196  0.183122 -1.620810  0.912996
foo one 0.6     -1.456278 -1.144081 -0.872170  0.547008
qux two 0.9      0.942656 -0.435219 -0.161408 -0.451456

Answer 3

这就是诀窍：

df.iloc[df.index.get_level_values(2) >= 0.2]

或者如果您愿意：

df.iloc[df.index.get_level_values('measure') >= 0.2]

Answer 4

根据您的初始方法，您可以使用IndexSlice

df.sort_index().loc[pd.IndexSlice[:, :, 0.2:], :]

按索引列上的条件过滤pandas Dataframe中的值

4 个答案: