Question

我有一个带有分层多指标的Pandas DataFrame，如下所示：

In [1]: df
S                         A         A         B         B         C
foo                       1         2         3         4         5 
bar                      10        20        30        40        50 
2016-09-25          0.09321  0.101425  0.129751  0.129751  0.098990
2016-10-06          0.09321  0.101425  0.091678  0.091678  0.030795
2016-10-18          0.09321  0.101425  0.143422  0.143422  0.045204
2016-10-25          0.09321  0.101425  0.103444  0.103444  0.045911

其中S，foo和bar是层次结构索引，日期是实际的DataFrame索引。

我想按S进行分组，并将层次结构索引视为与数据框相同，以便df.sum或df.groupby(level=0,axis=1).sum()版本看起来像这样，包括 foo 和 bar 行：

S                         A         B         C
foo                       3         7         5
bar                      30        70        50
2016-09-25         0.194635  0.259502  0.098990
2016-10-06         0.194635  0.183356  0.030795
2016-10-18         0.194635  0.286844  0.045204
2016-10-25         0.194635  0.206887  0.045911

Answer 1

让我们试试这个（注意：如果列索引的第1级和第2级的dtype已经是int，则可能不需要.apply(pd.to_numeric)。

dict1 = dict((i,'mean') for i in df.index)
dict1['foo'] = 'sum'
dict1['bar'] = 'sum'

df.T.reset_index().apply(pd.to_numeric)\
  .groupby('S').agg(dict1)\
  .set_index(['foo','bar'], append=True).T

输出：

S                 13        14        15
foo               49        53        28
bar              202       215       94 
2016-10-06  0.097318  0.091678  0.030795
2016-10-18  0.097318  0.143422  0.045204
2016-09-25  0.097318  0.129751  0.098990
2016-10-25  0.097318  0.103444  0.045911

问题中的新数据：

dict1 = dict((i,'mean') for i in df.index)
dict1['foo'] = 'sum'
dict1['bar'] = 'sum'

print(df.T.reset_index(level=[1,2]).apply(pd.to_numeric)
        .groupby('S').agg(dict1)
        .set_index(['foo','bar'], append=True).T)

输出：

S                  A         B         C
foo                3         7         5
bar               30        70        50
2016-10-06  0.097318  0.091678  0.030795
2016-10-18  0.097318  0.143422  0.045204
2016-09-25  0.097318  0.129751  0.098990
2016-10-25  0.097318  0.103444  0.045911

Pandas通过heirarchical multiindex分组而不会丢失其他指数

1 个答案: