从具有命名标签的MultiIndex数据框中获取列

时间:2017-01-23 14:50:13

标签: python pandas

我有这种形式的数据框:

first        bar                           foo                    
second       one       two     three       one       two     three
0      -2.008137  0.505892 -0.671299 -1.289395 -1.087887 -0.146657
1      -0.786329 -0.501268 -1.454408  2.627911  0.689416 -0.877968
2      -0.697007  0.929783  0.181715  0.533407  0.117859 -0.557975
3      -1.276656 -0.405381 -0.674329  0.117411  1.536421  0.040912

我想选择基于一个级别名称的索引的数据,如下所示:

selected = data.xs(('bar', 'two'), level = ['first','second'], axis=1)

这个作品。但是,我想以这种方式选择多个标签。类似的东西:

selected = data.xs(('bar', ['one','two']), level = ['first','second'], axis=1)

为了获得:

first        bar                 
second       one       two  
0      -2.008137  0.505892 
1      -0.786329 -0.501268 
2      -0.697007  0.929783
3      -1.276656 -0.405381

然而,这不起作用。如何以这种方式优雅地选择数据?重要的是我可以使用关卡名称('first'和'second')。

2 个答案:

答案 0 :(得分:2)

您可以使用slicers

#KeyError: 'MultiIndex Slicing requires the index to be fully lexsorted     
df = df.sort_index(axis=1)
idx = pd.IndexSlice
print (df.loc[:, idx['bar', ['one','two']]])
first        bar          
second       one       two
0      -2.008137  0.505892
1      -0.786329 -0.501268
2      -0.697007  0.929783
3      -1.276656 -0.405381

另一种解决方案:

df = df.sort_index(axis=1)
print (df.loc[:, ('bar', ['one','two'])])
first        bar          
second       one       two
0      -2.008137  0.505892
1      -0.786329 -0.501268
2      -0.697007  0.929783
3      -1.276656 -0.405381

但是,如果需要选择级别名称,请使用get_level_values isin,然后选择boolean indexing(选择列,因此需要loc):

mask1 = df.columns.get_level_values('first') == 'bar'
mask2 = df.columns.get_level_values('second').isin(['one','two'])
print (df.loc[:, mask1 & mask2])
first        bar          
second       one       two
0      -2.008137  0.505892
1      -0.786329 -0.501268
2      -0.697007  0.929783
3      -1.276656 -0.405381

答案 1 :(得分:2)

您可以使用query方法但需要转置

data.T.query('first in ["bar", "foo"] and second in ["one", "two"]').T
#    ⤷ transpose here                                transpose back ⤴

您可以在query之外设置这些变量并引用它们

first = ['bar', 'foo']
second = ['one', 'two']
data.T.query('first in @first and second in @second').T
#    ⤷ transpose here                 transpose back ⤴

enter image description here

这是一个不常用的替代方案

data.filter(regex='one|two')

enter image description here