我有一个元组列表,每个元组都有时间戳,我想得到最新的时间戳 - 元组的每个第一个位置的旧时间戳。
example_out put = [(2038, A, [Timestamp('2010-01-24 00:00:00')- Timestamp('2010-02-20 00:00:00')]),(2038,B , [Timestamp('2017-01-24 00:00:00')- Timestamp('2017-02-20 00:00:00')])] It has to do for all the IDS
abc = [(2038, 'A', Timestamp('2010-01-24 00:00:00')),
(2038, 'A', Timestamp('2010-01-27 00:00:00')),
(2038, 'A', Timestamp('2010-01-30 00:00:00')),
(2038, 'A', Timestamp('2010-02-02 00:00:00')),
(2038, 'A', Timestamp('2010-02-06 00:00:00')),
(2038, 'A', Timestamp('2010-02-11 00:00:00')),
(2038, 'A', Timestamp('2010-02-18 00:00:00')),
(2038, 'A', Timestamp('2010-02-20 00:00:00')),
(2038, 'B', Timestamp('2017-01-24 00:00:00')),
(2038, 'B', Timestamp('2017-01-27 00:00:00')),
(2038, 'B', Timestamp('2017-01-30 00:00:00')),
(2038, 'B', Timestamp('2017-02-02 00:00:00')),
(2038, 'B', Timestamp('2017-02-06 00:00:00')),
(2038, 'B', Timestamp('2017-02-11 00:00:00')),
(2038, 'B', Timestamp('2017-02-18 00:00:00')),
(2038, 'B', Timestamp('2017-02-20 00:00:00')),
(2120, 'A', Timestamp('2010-01-24 00:00:00'))]
这是正确的方法将所有ID放入列表然后计算最小和最大日期?
d = {}
l = []
for r in abc:
l.append(r)
if r[0] not in d:
d[r[0]] = r[1],[r[2]]
print(d)
答案 0 :(得分:2)
由于您已使用pandas
,因此可以使用pd.DataFrame.groupby
:
res = pd.DataFrame(abc, columns=['Year', 'Category', 'Date'])\
.groupby(['Year', 'Category'])['Date'].agg(lambda x: x.max() - x.min())\
.reset_index()
print(res)
Year Category Date
0 2038 A 27 days
1 2038 B 27 days
2 2120 A 0 days