我有这种形式的数据框:
first bar foo
second one two three one two three
0 -2.008137 0.505892 -0.671299 -1.289395 -1.087887 -0.146657
1 -0.786329 -0.501268 -1.454408 2.627911 0.689416 -0.877968
2 -0.697007 0.929783 0.181715 0.533407 0.117859 -0.557975
3 -1.276656 -0.405381 -0.674329 0.117411 1.536421 0.040912
我想选择基于一个级别名称的索引的数据,如下所示:
selected = data.xs(('bar', 'two'), level = ['first','second'], axis=1)
这个作品。但是,我想以这种方式选择多个标签。类似的东西:
selected = data.xs(('bar', ['one','two']), level = ['first','second'], axis=1)
为了获得:
first bar
second one two
0 -2.008137 0.505892
1 -0.786329 -0.501268
2 -0.697007 0.929783
3 -1.276656 -0.405381
然而,这不起作用。如何以这种方式优雅地选择数据?重要的是我可以使用关卡名称('first'和'second')。
答案 0 :(得分:2)
您可以使用slicers:
#KeyError: 'MultiIndex Slicing requires the index to be fully lexsorted
df = df.sort_index(axis=1)
idx = pd.IndexSlice
print (df.loc[:, idx['bar', ['one','two']]])
first bar
second one two
0 -2.008137 0.505892
1 -0.786329 -0.501268
2 -0.697007 0.929783
3 -1.276656 -0.405381
另一种解决方案:
df = df.sort_index(axis=1)
print (df.loc[:, ('bar', ['one','two'])])
first bar
second one two
0 -2.008137 0.505892
1 -0.786329 -0.501268
2 -0.697007 0.929783
3 -1.276656 -0.405381
但是,如果需要选择级别名称,请使用get_level_values
isin
,然后选择boolean indexing
(选择列,因此需要loc
):
mask1 = df.columns.get_level_values('first') == 'bar'
mask2 = df.columns.get_level_values('second').isin(['one','two'])
print (df.loc[:, mask1 & mask2])
first bar
second one two
0 -2.008137 0.505892
1 -0.786329 -0.501268
2 -0.697007 0.929783
3 -1.276656 -0.405381
答案 1 :(得分:2)
您可以使用query
方法但需要转置
data.T.query('first in ["bar", "foo"] and second in ["one", "two"]').T
# ⤷ transpose here transpose back ⤴
或 您可以在query
之外设置这些变量并引用它们
first = ['bar', 'foo']
second = ['one', 'two']
data.T.query('first in @first and second in @second').T
# ⤷ transpose here transpose back ⤴
这是一个不常用的替代方案
data.filter(regex='one|two')