按组对熊猫数据框进行排序

时间:2019-06-24 22:08:52

标签: pandas

我想按某些优先级规则对数据框进行排序。

我已经在下面的代码中实现了这一点,但是我认为这是一个非常棘手的解决方案。

有没有更合适的Pandas方法?

import pandas as pd
import numpy as np

df=pd.DataFrame({"Primary Metric":[80,100,90,100,80,100,80,90,90,100,90,90,80,90,90,80,80,80,90,90,100,80,80,100,80],
                "Secondary Metric Flag":[0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0],
                "Secondary Value":[15, 59, 70, 56, 73, 88, 83, 64, 12, 90, 64, 18, 100, 79, 7, 71, 83, 3, 26, 73, 44, 46, 99,24, 20],
                "Final Metric":[222, 883, 830, 907, 589, 93, 479, 498, 636, 761, 851, 349, 25, 405, 132, 491, 253, 318, 183, 635, 419, 885, 305, 258, 924]})


Primary_List=list(np.unique(df['Primary Metric']))
Primary_List.sort(reverse=True)


df_sorted=pd.DataFrame()

for p in Primary_List:
    lol=df[df["Primary Metric"]==p]
    lol.sort_values(["Secondary Metric Flag"],ascending = False)

    pt1=lol[lol["Secondary Metric Flag"]==1].sort_values(by=['Secondary Value', 'Final Metric'], ascending=[False, False])

    pt0=lol[lol["Secondary Metric Flag"]==0].sort_values(["Final Metric"],ascending = False)

    df_sorted=df_sorted.append(pt1)
    df_sorted=df_sorted.append(pt0)


df_sorted  

优先级规则为:

  1. 首先按“主要指标”排序,然后按“次要指标”排序 标记”。

  2. 如果'Secondary Metric Flag' ==1,则按“次要值”排序,然后 “最终指标”

    • 如果为==0,则直接进入“最终指标”。

感谢任何反馈。

2 个答案:

答案 0 :(得分:2)

您无需在此处进行循环和groupby,只需将它们拆分为sort_values

df1=df.loc[df['Secondary Metric Flag']==1].sort_values(by=['Primary Metric','Secondary Value', 'Final Metric'], ascending=[True,False, False])
df0=df.loc[df['Secondary Metric Flag']==0].sort_values(["Primary Metric","Final Metric"],ascending = [True,False])

df=pd.concat([df1,df0]).sort_values('Primary Metric')

答案 1 :(得分:1)

sortedloc

def k(t):
    p, s, v, f = df.loc[t]
    return (-p, -s, -s * v, -f)

df.loc[sorted(df.index, key=k)]

    Primary Metric  Secondary Metric Flag  Secondary Value  Final Metric
9              100                      1               90           761
5              100                      1               88            93
1              100                      1               59           883
3              100                      1               56           907
23             100                      1               24           258
20             100                      0               44           419
13              90                      1               79           405
19              90                      1               73           635
7               90                      1               64           498
11              90                      1               18           349
10              90                      0               64           851
2               90                      0               70           830
8               90                      0               12           636
18              90                      0               26           183
14              90                      0                7           132
15              80                      1               71           491
21              80                      1               46           885
17              80                      1                3           318
24              80                      0               20           924
4               80                      0               73           589
6               80                      0               83           479
22              80                      0               99           305
16              80                      0               83           253
0               80                      0               15           222
12              80                      0              100            25

sorteditertuples

def k(t):
    _, p, s, v, f = t
    return (-p, -s, -s * v, -f)

idx, *tups = zip(*sorted(df.itertuples(), key=k))

pd.DataFrame(dict(zip(df, tups)), idx)

lexsort

p = df['Primary Metric']
s = df['Secondary Metric Flag']
v = df['Secondary Value']
f = df['Final Metric']
a = np.lexsort([
    -p, -s, -s * v, -f
][::-1])

df.iloc[a]

构造新的DataFrame

df.mul([-1, -1, 1, -1]).assign(
    **{'Secondary Value': lambda d: d['Secondary Metric Flag'] * d['Secondary Value']}
).pipe(
    lambda d: df.loc[d.sort_values([*d]).index]
)