标题可能有点令人困惑所以这是一个例子:
自:
id | timestamp
1 | 2015-12-02 00:00:00
1 | 2015-12-03 00:00:00 <--- latest for id 1
2 | 2015-12-02 00:00:00
2 | 2015-12-04 00:00:00
2 | 2015-12-06 00:00:00 <--- latest for id 2
对此:
id | timestamp
1 | 2015-12-03 00:00:00
2 | 2015-12-06 00:00:00
答案 0 :(得分:2)
使用nth
In [599]: df.groupby('id', as_index=False).nth(-1)
Out[599]:
id timestamp
1 1 2015-12-03 00:00:00
4 2 2015-12-06 00:00:00
理想情况下,max
因为您需要最新日期。
In [601]: df.groupby('id', as_index=False).max()
Out[601]:
id timestamp
0 1 2015-12-03 00:00:00
1 2 2015-12-06 00:00:00
此外,评论中提到的tail
In [602]: df.groupby('id').tail(1)
Out[602]:
id timestamp
1 1 2015-12-03 00:00:00
4 2 2015-12-06 00:00:00