我已经看过this question,但所期望的结果与我的略有不同。
想象一下如此分组的数据框:
df.groupby(['product_name', 'usage_type']).total_cost.sum()
product_name usage_type
Lorem A 30.694665
B 0.000634
C 1.659360
D 0.000031
E 3339.140042
F 0.074340
Ipsum G 9.627360
A 19.053377
D 14.492155
Dolor B 9.698245
H 6993.792163
C 31947.955679
D 2150.400001
E 26.337789
Name: total_cost, dtype: float6
我想要的输出是相同的结构,但有两个属性:
这样,成本最高的产品首先出现,但仍然保留了故障。
如果它更简单,我可以按使用类型删除二级排序。
答案 0 :(得分:5)
从分组的DataFrame开始:
import pandas as pd
df2 = pd.read_table('data', sep='\s+').set_index(['product_name', 'usage_type'])
# val
# product_name usage_type
# Lorem A 30.694665
# B 0.000634
# C 1.659360
# D 0.000031
# E 3339.140042
# F 0.074340
# Ipsum G 9.627360
# A 19.053377
# D 14.492155
# Dolor B 9.698245
# H 6993.792163
# C 31947.955679
# D 2150.400001
# E 26.337789
您可以将键值存储在新列中:
df2['key1'] = df2.groupby(level='product_name')['val'].transform('sum')
df2['key2'] = df2.index.get_level_values('usage_type')
然后按这些关键列排序:
# >>> df2.sort(['key1', 'key2'], ascending=[False,True])
# val key1 key2
# product_name usage_type
# Dolor B 9.698245 41128.183877 B
# C 31947.955679 41128.183877 C
# D 2150.400001 41128.183877 D
# E 26.337789 41128.183877 E
# H 6993.792163 41128.183877 H
# Lorem A 30.694665 3371.569072 A
# B 0.000634 3371.569072 B
# C 1.659360 3371.569072 C
# D 0.000031 3371.569072 D
# E 3339.140042 3371.569072 E
# F 0.074340 3371.569072 F
# Ipsum A 19.053377 43.172892 A
# D 14.492155 43.172892 D
# G 9.627360 43.172892 G
result = df2.sort(['key1', 'key2'], ascending=[False,True])['val']
print(result)
产量
product_name usage_type
Dolor B 9.698245
C 31947.955679
D 2150.400001
E 26.337789
H 6993.792163
Lorem A 30.694665
B 0.000634
C 1.659360
D 0.000031
E 3339.140042
F 0.074340
Ipsum A 19.053377
D 14.492155
G 9.627360