如何按分组总和对分组依据进行排序?

时间:2018-12-20 22:32:23

标签: pandas python-2.7

我目前遇到一个问题,很难解释。我有一个已被分组为4s的数据框。条目的每一行都有一个名为“值”的列。

                 Name      Role  Cost  Value  
0       Johnny Tsunami   Driver  1000     39
1   Michael B. Jackson   Pistol  2500     46
2           Bobby Zuko   Pistol  3000     50
3         Greg Ritcher  Lookout   200     25
4       Johnny Tsunami   Driver  1000     39
5   Michael B. Jackson   Pistol  2500     46
6           Bobby Zuko   Pistol  3000     50
7          Appa Derren  Lookout   250     30
8          Baby Hitsuo   Driver   950     35
9   Michael B. Jackson   Pistol  2500     46
10          Bobby Zuko   Pistol  3000     50
11         Appa Derren  Lookout   250     30

基本上,我希望按每个groupby中值的总和对这些组进行降序排序。

似乎应该很简单。我尝试了很多事情,并遇到了各种错误,例如:sum()not and att属性,str问题,dataframe对象问题。我试过使用sort,sum,lambda,agg函数。我无法相信我在按降序对分组依据进行排序时遇到麻烦。这是一个片段和视觉效果。

groupby基本上对上述数据帧执行此操作:

0
                 Name     Role  Cost  Value
0      Johnny Tsunami   Driver  1000     39
1  Michael B. Jackson   Pistol  2500     46
2          Bobby Zuko   Pistol  3000     50
3        Greg Ritcher  Lookout   200     25

Cost: 6700   Value: 160

1
                 Name     Role  Cost  Value
4      Johnny Tsunami   Driver  1000     39
5  Michael B. Jackson   Pistol  2500     46
6          Bobby Zuko   Pistol  3000     50
7         Appa Derren  Lookout   250     30

Cost: 6750   Value: 165

2
                  Name     Role  Cost  Value
8          Baby Hitsuo   Driver   950     35
9   Michael B. Jackson   Pistol  2500     46
10          Bobby Zuko   Pistol  3000     50
11         Appa Derren  Lookout   250     30

Cost: 6700   Value: 161

排序时,我希望打印数据框和最终结果:

4       Johnny Tsunami   Driver  1000     39
5   Michael B. Jackson   Pistol  2500     46
6           Bobby Zuko   Pistol  3000     50
7          Appa Derren  Lookout   250     30
8          Baby Hitsuo   Driver   950     35
9   Michael B. Jackson   Pistol  2500     46
10          Bobby Zuko   Pistol  3000     50
11         Appa Derren  Lookout   250     30
0       Johnny Tsunami   Driver  1000     39
1   Michael B. Jackson   Pistol  2500     46
2           Bobby Zuko   Pistol  3000     50
3         Greg Ritcher  Lookout   200     25

以下是数据框和代码:

from pprint import pprint
import pandas as pd
import numpy as np

data= [['Johnny Tsunami','Driver',1000,39],
['Michael B. Jackson','Pistol',2500,46],
['Bobby Zuko','Pistol',3000,50],
['Greg Ritcher','Lookout',200,25],
['Johnny Tsunami','Driver',1000,39],
['Michael B. Jackson','Pistol',2500,46],
['Bobby Zuko','Pistol',3000,50],
['Appa Derren','Lookout',250,30],
['Baby Hitsuo','Driver',950,35],
['Michael B. Jackson','Pistol',2500,46],
['Bobby Zuko','Pistol',3000,50],
['Appa Derren','Lookout',250,30]]

df = pd.DataFrame(data,columns=['Name','Role','Cost','Value'])

#groupby4s
gr = df.groupby(np.arange(len(df.index))/4)

2 个答案:

答案 0 :(得分:2)

这就是我要做的:

首先创建4个组,对它们进行排序,然后保存索引顺序(更改代码以构建组以使用整数除法)

gr = df.groupby(np.arange(len(df.index.values))//4)
grp_order = (gr.sum()).sort_values('Value', ascending=False).index

然后按正确的顺序打印:

for idx in grp_order:
    print(idx)
    print(gr.get_group(idx))
    print('Cost: ', gr.get_group(idx).Value.sum())

输出:

1
                 Name     Role  Cost  Value
4      Johnny Tsunami   Driver  1000     39
5  Michael B. Jackson   Pistol  2500     46
6          Bobby Zuko   Pistol  3000     50
7         Appa Derren  Lookout   250     30
Cost:  165
2
                  Name     Role  Cost  Value
8          Baby Hitsuo   Driver   950     35
9   Michael B. Jackson   Pistol  2500     46
10          Bobby Zuko   Pistol  3000     50
11         Appa Derren  Lookout   250     30
Cost:  161
0
                 Name     Role  Cost  Value
0      Johnny Tsunami   Driver  1000     39
1  Michael B. Jackson   Pistol  2500     46
2          Bobby Zuko   Pistol  3000     50
3        Greg Ritcher  Lookout   200     25
Cost:  160

答案 1 :(得分:2)

使用if(!$(this).hasClass('expand')) { if(expandedEl){ expandedEl.removeClass('expand'); expandedEl.text('EXPAND'); } $(this).addClass('expand'); $(this).text('COLLAPSE'); expandedEl = $(this); } 创建附加密钥,然后我们按transform

对密钥进行排序
sort_values

注意,我没有删除我创建的用于排序的键,您可以执行df['key']=df['Value'].groupby(np.arange(len(df))//4).transform('sum') df=df.sort_values('key',ascending=False) df Out[104]: Name Role Cost Value key 4 Johnny Tsunami Driver 1000 39 165 5 Michael B. Jackson Pistol 2500 46 165 6 Bobby Zuko Pistol 3000 50 165 7 Appa Derren Lookout 250 30 165 8 Baby Hitsuo Driver 950 35 161 9 Michael B. Jackson Pistol 2500 46 161 10 Bobby Zuko Pistol 3000 50 161 11 Appa Derren Lookout 250 30 161 0 Johnny Tsunami Driver 1000 39 160 1 Michael B. Jackson Pistol 2500 46 160 2 Bobby Zuko Pistol 3000 50 160 3 Greg Ritcher Lookout 200 25 160 来删除它。