Pandas按组总和排序多指数

时间:2016-06-20 21:21:57

标签: python sorting pandas dataframe multi-index

给出以下数据框:

import pandas as pd
df=pd.DataFrame({'County':['A','B','C','D','A','B','C','D','A','B','C','D','A','B','C','D','A','B'],
                'Hospital':['a','b','c','d','e','a','b','c','e','a','b','c','d','e','a','b','c','e'],
                'Enrollment':[44,55,42,57,95,54,27,55,81,54,65,23,89,76,34,12,1,67],
                'Year':['2012','2012','2012','2012','2012','2012','2012','2012','2012','2013',
                        '2013','2013','2013','2013','2013','2013','2013','2013']})
d2=pd.pivot_table(df,index=['County','Hospital'],columns=['Year'])#.sort_columns

d2
        Enrollment
        Year   2012     2013
County  Hospital        
A       a      44.0     NaN
        c      NaN      1.0
        d      NaN      89.0
        e      88.0     NaN
B       a      54.0     54.0
        b      55.0     NaN
        e      NaN      71.5
C       a      NaN      34.0
        b      27.0     65.0
        c      42.0     NaN
D       b      NaN      12.0
        c      55.0     23.0
        d      57.0     NaN

我需要对数据框进行排序,以便按照最近一年的注册总和(我想避免直接使用'2013')对县进行排序,这样:

        Enrollment  
    Year          2012  2013
County  Hospital        
B       a         54    54
        b         55    NaN
        e         NaN   71.5
C       a         NaN   34
        b         27    65
        c         42    NaN
A       a         44    NaN
        c         NaN   1
        d         NaN   89
        e         88    NaN
D       b         NaN   12
        c         55    23
        d         57    NaN

然后,我希望每个医院的每个医院都在下降,但2013年的入学时间如下:

        Enrollment  
        Year    2012    2013
County  Hospital        
B       e       NaN 71.5
        a       54  54
        b       55  NaN
C       b       27  65
        a       NaN 34
        c       42  NaN
A       d       NaN 89
        c       NaN 1
        a       44  NaN
        e       88  NaN
D       c       55  23
        b       NaN 12
        d       57  NaN

到目前为止,我已经尝试过使用groupby来获取金额并合并后面但没有任何运气:

d2.groupby('County').sum()

提前致谢!

1 个答案:

答案 0 :(得分:1)

你可以:

max_col = max(d2.columns.get_level_values(1)) # get column 2013
d2['sum'] = d2.groupby(level='County').transform('sum').loc[:, ('Enrollment', max_col)]
d2 = d2.sort_values(['sum', ('Enrollment', max_col)], ascending=[False, False])

得到:

                Enrollment          sum
Year                  2012  2013       
County Hospital                        
B      e               NaN  71.5  125.5
       a              54.0  54.0  125.5
       b              55.0   NaN  125.5
C      b              27.0  65.0   99.0
       a               NaN  34.0   99.0
       c              42.0   NaN   99.0
A      d               NaN  89.0   90.0
       c               NaN   1.0   90.0
       a              44.0   NaN   90.0
       e              88.0   NaN   90.0
D      c              55.0  23.0   35.0
       b               NaN  12.0   35.0
       d              57.0   NaN   35.0