假设我拥有以下数据框:
id1 dt id2 count
0 2010-02-06 07:21:45 id0 78
0 2010-02-06 07:21:45 id1 79
0 2010-02-06 07:21:45 id2 80
0 2010-02-06 07:21:45 id3 69
0 2010-02-06 07:58:25 id4 58
1 2010-02-06 07:58:25 id0 67
我想为每个“ dt”保留由较高“ count”选择的第n个最“ id2”。 因此获得nth = 3:
id1 dt id2 count
0 2010-02-06 07:21:45 id0 78
0 2010-02-06 07:21:45 id1 79
0 2010-02-06 07:21:45 id2 80
0 2010-02-06 07:58:25 id4 58
1 2010-02-06 07:58:25 id0 67
它还应该检测多个“ id2”。因此,如果输入为:
id1 dt id2 count
0 2010-02-06 07:21:45 id0 78
0 2010-02-06 07:21:45 id1 79
0 2010-02-06 07:21:45 id2 80
0 2010-02-06 07:21:45 id2 79
0 2010-02-06 07:21:45 id3 69
0 2010-02-06 07:58:25 id4 58
1 2010-02-06 07:58:25 id0 67
它必须返回nth = 3
id1 dt id2 count
0 2010-02-06 07:21:45 id0 78
0 2010-02-06 07:21:45 id1 79
0 2010-02-06 07:21:45 id2 80
0 2010-02-06 07:58:25 id4 58
1 2010-02-06 07:58:25 id0 67
答案 0 :(得分:1)
这应该有效:
df = df.sort_values("count", ascending=False).groupby(["dt", "id2"], as_index=False).first()
df = df.groupby("dt").apply(lambda x: x.iloc[0:3]).reset_index(drop=True)