通过保持第n个最大值来减少熊猫数据框

时间:2020-02-15 22:06:36

标签: pandas

假设我拥有以下数据框:

id1    dt                   id2   count
0      2010-02-06 07:21:45  id0   78
0      2010-02-06 07:21:45  id1   79
0      2010-02-06 07:21:45  id2   80
0      2010-02-06 07:21:45  id3   69
0      2010-02-06 07:58:25  id4   58
1      2010-02-06 07:58:25  id0   67

我想为每个“ dt”保留由较高“ count”选择的第n个最“ id2”。 因此获得nth = 3:

id1    dt                   id2   count
0      2010-02-06 07:21:45  id0   78
0      2010-02-06 07:21:45  id1   79
0      2010-02-06 07:21:45  id2   80
0      2010-02-06 07:58:25  id4   58
1      2010-02-06 07:58:25  id0   67

它还应该检测多个“ id2”。因此,如果输入为:

id1    dt                   id2   count
0      2010-02-06 07:21:45  id0   78
0      2010-02-06 07:21:45  id1   79
0      2010-02-06 07:21:45  id2   80
0      2010-02-06 07:21:45  id2   79
0      2010-02-06 07:21:45  id3   69
0      2010-02-06 07:58:25  id4   58
1      2010-02-06 07:58:25  id0   67

它必须返回nth = 3

id1    dt                   id2   count
0      2010-02-06 07:21:45  id0   78
0      2010-02-06 07:21:45  id1   79
0      2010-02-06 07:21:45  id2   80
0      2010-02-06 07:58:25  id4   58
1      2010-02-06 07:58:25  id0   67

1 个答案:

答案 0 :(得分:1)

这应该有效:

df = df.sort_values("count", ascending=False).groupby(["dt", "id2"], as_index=False).first()
df = df.groupby("dt").apply(lambda x: x.iloc[0:3]).reset_index(drop=True)