如何使用Python pandas在特定的一个级别中选择多个列名

时间:2016-01-29 05:51:05

标签: python pandas

有一个DataFrame,其中MultipleIndex为列。 我知道当我只想选择一个列名和级别名称时,我可以使用.xs(),如下面的代码。

df.xs('column_name1', level='column_level1', axis=1)

在我的具体情况下,我想选择多个列名,如下面的代码。 (实际上它不起作用,因为.xs不支持这种方式。)

df.xs(['column_name1', 'column_name2'], level='column_level1', axis=1)

如何在特定的一个级别中选择多个列名?

我展示了更具体的代码。

import pandas as pd
import io

data = u"""
column_name1,column_name2,column_name3
column_nameA,column_nameB,column_nameC
0.1,1,10
0.2,2,20
0.3,3,30
"""
df = pd.read_csv(io.StringIO(data), header=[0, 1])
df.columns.names = ['column_level1', 'column_level2']
print df

df就是这个

column_level1 column_name1 column_name2 column_name3
column_level2 column_nameA column_nameB column_nameC
0                      0.1            1           10
1                      0.2            2           20
2                      0.3            3           30

并且,我想按列名

制作这些数据
column_level1 column_name1 column_name2
column_level2 column_nameA column_nameB
0                      0.1            1
1                      0.2            2
2                      0.3            3

2 个答案:

答案 0 :(得分:0)

IIUC您可以将locslice docs

一起使用
In [58]: df
Out[58]:
first        bar                 baz                 foo                 qux  
second       one       two       one       two       one       two       one  two
0      -0.313815 -0.160567 -0.028432 -1.169930  1.043274  0.353722 -0.912303 -1.041827
1      -0.317570 -0.452766  0.950578  0.467092 -1.960936  1.700110  0.003934  0.989709
2       0.091249  2.406773  1.848771 -1.275288  0.740245  0.657444 -1.157392 -0.103663

In [59]: df.loc[:, (['bar', 'baz'], slice(None))]
Out[59]:
first        bar                 baz
second       one       two       one       two
0      -0.313815 -0.160567 -0.028432 -1.169930
1      -0.317570 -0.452766  0.950578  0.467092
2       0.091249  2.406773  1.848771 -1.275288

第二级:

In [68]: df.loc[:, (slice(None), ['one', 'two'])]
Out[68]:
first        bar                 baz                 foo                 qux  
second       one       two       one       two       one       two       one       two
0      -0.313815 -0.160567 -0.028432 -1.169930  1.043274  0.353722 -0.912303 -1.041827
1      -0.317570 -0.452766  0.950578  0.467092 -1.960936  1.700110  0.003934  0.989709
2       0.091249  2.406773  1.848771 -1.275288  0.740245  0.657444 -1.157392 -0.103663

修改

对于您的数据框:

In [75]: df.loc[:, (slice(None), ['column_nameA', 'column_nameB'])]
Out[75]:
column_level1 column_name1 column_name2
column_level2 column_nameA column_nameB
0                      0.1            1
1                      0.2            2
2                      0.3            3

In [77]: df.loc[:, (['column_name1', 'column_name2'], slice(None))]
Out[77]:
column_level1 column_name1 column_name2
column_level2 column_nameA column_nameB
0                      0.1            1
1                      0.2            2
2                      0.3            3

答案 1 :(得分:0)

您可以尝试select

print df.select(lambda x: x[0] in ['column_name1','column_name2'], axis=1)

column_level1 column_name1 column_name2
column_level2 column_nameA column_nameB
0                      0.1            1
1                      0.2            2
2                      0.3            3

get_level_valuesisin

print df.loc[:, df.columns.get_level_values('column_level1')
                          .isin(['column_name1','column_name2'])]

column_level1 column_name1 column_name2
column_level2 column_nameA column_nameB
0                      0.1            1
1                      0.2            2
2                      0.3            3