Question

我有一个具有多索引分层列名称的数据框，包括空字符串作为列索引名称。如何对第二和第三列进行子集化？

import numpy as np
import pandas as pd

df = pd.DataFrame(np.arange(15).reshape(5,3),
                         index=[1,2,3,4,5],
                         columns=[['A', 'A', 'B'],
                                  ['a', 'b', ''],
                                  ['',  'x', '']]
)
df.columns.names = ["c_ix0", "c_ix1", "c_ix2"]
print(df)

c_ix0   A       B
c_ix1   a   b    
c_ix2       x    
1       0   1   2
2       3   4   5
3       6   7   8
4       9  10  11
5      12  13  14

预期输出：

c_ix0      A   B
c_ix1      b    
c_ix2      x    
1          1   2
2          4   5
3          7   8
4         10  11
5         13  14

Answer 1

我相信您需要xs：

a = df.xs('b', axis=1, level=1)
print (a)
c_ix0   A
c_ix2   x
1       1
2       4
3       7
4      10
5      13

b = df.xs('B', axis=1, level=0)
print (b)
c_ix1    
c_ix2    
1       2
2       5
3       8
4      11
5      14

如果要按位置选择，请使用iloc：

c = df.iloc[:, 1]
print (c)
1     1
2     4
3     7
4    10
5    13
Name: (A, b, x), dtype: int32

编辑：

d = df.iloc[:, [1, 2]]
print (d)
c_ix0   A   B
c_ix1   b    
c_ix2   x    
1       1   2
2       4   5
3       7   8
4      10  11
5      13  14

子集多索引列

1 个答案: