Question

我有一个带有multiindex的数据框：

MyFirstJavaClass.java

当我在其中一个索引级别上使用>>> df = pd.DataFrame(np.random.randint(0,5,(6, 2)), columns=['col1','col2']) >>> df['ind1'] = list('AAABCC') >>> df['ind2'] = range(6) >>> df.set_index(['ind1','ind2'], inplace=True) >>> df col1 col2 ind1 ind2 A 0 2 0 1 2 2 2 1 2 B 3 2 2 C 4 4 0 5 1 4选择数据，然后应用.loc[]时，生成的索引按预期“缩小”以仅匹配结果数据框中包含的值：

.query()

然而，当我尝试仅使用>>> df.loc['A'].query('col2 == 2') col1 col2 ind2 1 2 2 2 1 2 >>> df.loc['A'].query('col2 == 2').index Int64Index([1, 2], dtype='int64', name='ind2')收到相同的结果时，pandas保持与原始数据帧相同的索引（尽管事实上，在单个索引的情况下，它不像上面那样表现 - 结果索引从.query()变为[0,1,2]，仅匹配[1,2]行）：

col2 == 2

是一个错误还是一个功能？如果有功能，请你解释一下这种行为吗？

EDIT1：我希望以下索引代替：

>>> df.query('ind1 == "A" & col2 == 2')

           col1  col2
ind1 ind2            
A    1        2     2
     2        1     2

>>> df.query('ind1 == "A" & col2 == 2').index

MultiIndex(levels=[['A', 'B', 'C'], [0, 1, 2, 3, 4, 5]],
           labels=[[0, 0], [1, 2]],
           names=['ind1', 'ind2'])

EDIT2：正如在Dataframe Slice does not remove Index Values中所解释的那样，在切片DF时根本不应该删除索引值;这种行为应该给出以下结果：

MultiIndex(levels=[['A'], [1, 2]],
           labels=[[0, 0], [0, 1]],
           names=['ind1', 'ind2'])

Answer 1

df.loc[A]返回带有常规（“单个”）索引的DF（或“视图”）：

In [12]: df.loc['A']
Out[12]:
      col1  col2
ind2
0        1     1
1        0     3
2        1     2

所以.query()将在具有常规索引的该DF上应用...

pandas.DataFrame.query保留原始的多索引

1 个答案: