Question

我想按索引和列为Multindex的熊猫数据框的某一层分组。我想对标题的其中一个级别进行分组，但是它给了我一个关键的错误，我不确定为什么。

此数据框可用作示例：

arrays = [['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux'],
['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two']]
tuples = list(zip(*arrays))

arrays2 = [['bar','baz','foo','qux'],
       ['one','two','one','two'],
       ['a','b','c','d']]
tuples2 = list(zip(*arrays2))

header = pd.MultiIndex.from_tuples(tuples, names=['h1', 'h2'])
index = pd.MultiIndex.from_tuples(tuples2, names=['first', 'second','third'])

df2=pd.DataFrame(np.random.randn(3, 3), index=index[:3], columns=header[:3])

如果我尝试做

df2.groupby('h1',axis=1).sum()

我遇到一个关键错误，但是索引工作正常。

df2.groupby(df2.index.names[0],axis=0).sum()

原因是什么，我该如何解决？

Answer 1

添加level

df2.groupby(level=['h1'],axis=1).sum()
Out[960]: 
h1                       bar       baz
first second third                    
bar   one    a     -1.077170  0.585508
baz   two    b     -3.426262 -0.193342
foo   one    c      1.079590  0.336535

或者在这里sum

df2.sum(level=['h2'],axis=1)
Out[965]: 
h2                       one       two
first second third                    
bar   one    a      0.028593 -0.520256
baz   two    b     -3.986019  0.366415
foo   one    c      0.548203  0.867922

熊猫groupby多级标题

1 个答案: