我有一个数据框
import numpy as np
import pandas as pd
arrays = [np.array(['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux']),np.array(['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two'])]
df = pd.DataFrame(np.random.randn(8, 4), index=arrays)
df.columns = ['a','b','c','d']
df
Out[3]:
a b c d
bar one 0.346640 -0.908057 1.327248 -0.600094
two -0.623039 -0.146015 -0.295474 0.283444
baz one -0.468552 0.319582 0.293260 -0.329329
two -0.912441 0.719779 -2.136825 0.997689
foo one -0.839984 -1.186596 -0.458738 0.661190
two -0.480537 0.514584 -0.284970 -1.871232
qux one 0.079585 -1.062287 0.075252 0.041869
two -2.285919 -0.697770 0.443770 0.072648
filtered_df = df.loc[
(df['b'] < -1)
]
当我尝试一些布尔索引时,它可以工作:
filtered_df
Out[5]:
a b c d
foo one -0.839984 -1.186596 -0.458738 0.661190
qux one 0.079585 -1.062287 0.075252 0.041869
那么为什么index属性仍然包含第一个DataFrame的所有索引?为什么baz
例如?
filtered_df.index
Out[6]:
MultiIndex(levels=[['bar', 'baz', 'foo', 'qux'], ['one', 'two']],
labels=[[2, 3], [0, 0]])