Question

是否可以对数据框进行排序以保持索引之间的匹配？

我的df：

             budget population
state   fu      
acre    ac1  600    50
        ac2  25     110
bahia   ba1  2300   80
        ba2   1     10
paulo   sp1  1000   100
        sp2  1000   230

我想在下面得到输出，因为索引bahia的总预算更高：

             budget population
state   fu      
bahia   ba1  2300   80
        ba2   1     10
paulo   sp1  1000   100
        sp2  1000   230
acre    ac1  600    50
        ac2  25     110

但是在使用sort_values（）之后，我得到以下输出：

              budget population
state   fu      
bahia   ba1   2300   80
paulo   sp1   1000   100
        sp2   1000   230
acre    ac1   600    50
        ac2   25     110
bahia   ba2    1     10

我更新了问题以提供更多背景信息

Answer 1

有多种方法可以做到这一点。一种方法是计算要排序的指标（总预算），对数据框进行排序，然后删除新创建的变量。

为了能够正确合并，我们将必须重置原始数据框的索引。

#Creating the total budget variable
gp = df.groupby('state')['budget'].sum().reset_index()
gp.columns = ['state','total_budget']

#Merging with the total budget variable
out = df.reset_index().merge(gp, on='state')

#Sorting based on total_budget
out = out.sort_values('total_budget', ascending = False)
out.drop('total_budget',inplace = True, axis = 1)
out = out.set_index(['state','fu'])

最终输出看起来像

           budget  population
state fu                     
bahia ba1    2300          80
      ba2       1          10
paulo sp1    1000         100
      sp2    1000         230
acre  ac1     600          50
      ac2      25         110

除此之外，更紧凑的解决方案是

out = pd.concat([x[1] for x in sorted(df.reset_index().groupby('state'), key = lambda x : -np.sum(x[1].budget) )]).set_index(['state','fu'])

这里，out的输出与以前相同。

对熊猫数据透视表进行排序以保持多个索引匹配

1 个答案: