熊猫根据多个条件对DataFrame进行排序

时间:2020-07-06 19:48:19

标签: python pandas pandas-groupby

我有一个如下所示的数据框:

Id    Name    Mag    Out      Des

23    Yah     1.0    base     n-0
23    Yah     1.0    base     n-0
23    Yah     1.0    base     n-0
24    Nah     0.99   base     n-0
24    Nah     1.01   line-2   line-2
24    Nah     0.95   line-3   line-3
24    Nah     1.1    line-4   line-4
25    lol     1.0    line-1   line-1
25    lol     1.1    line-3   line-3
25    lol     0.9    line-4   line-4
25    lol     0.95   line-5   line-5

输出必须满足以下条件:

  1. 对于相同的ID和名称,如果“ out”列仅具有基数,则仅报告一次与第一行相对应的项目。
  2. 对于相同的ID和名称,如果“ out”列中至少有一个基础项目,则报告与该基础相对应的行以及“ Mag”列的最小和最大值。

输出必须采用以下格式:

Id    Name    Mag    Out      Des

23    Yah     1.0    base     n-0
24    Nah     0.99   base     n-0
24    Nah     0.95   line-3   line-3
24    Nah     1.1    line-4   line-4
25    lol     0.9    line-4   line-4
25    lol     0.95   line-5   line-5
25    lol     1.0    line-1   line-1
25    lol     1.1    line-3   line-3

1 个答案:

答案 0 :(得分:1)

这是一种方法。为了清晰起见,分几个步骤进行操作:

def check_base(x):
    if all([elem == "base" for elem in x]):
        return ["keep"] + ["drop"] * (len(x)-1)
    elif "base" in list(x):
        return ["keep" if i=="base" else "maybe" for i in list(x)]
    else:
        return "keep"

df["criteria"] = df.groupby(["Id", "Name"], as_index = False).Out.transform(check_base)

g_min = df.groupby(["Id", "Name"]).Mag.transform("min")
g_max = df.groupby(["Id", "Name"]).Mag.transform("max")

df = df[(df.criteria == "keep") | (df.criteria == "maybe") & ((df.Mag == g_min) | (df.Mag == g_max))]

结果是:

    Id Name   Mag     Out     Des criteria
0   23  Yah  1.00    base     n-0     keep
3   24  Nah  0.99    base     n-0     keep
5   24  Nah  0.95  line-3  line-3    maybe
6   24  Nah  1.10  line-4  line-4    maybe
7   25  lol  1.00  line-1  line-1     keep
8   25  lol  1.10  line-3  line-3     keep
9   25  lol  0.90  line-4  line-4     keep
10  25  lol  0.95  line-5  line-5     keep