元组列表计算元组的最小 - 最大日期差异

时间:2018-05-02 09:54:13

标签: python pandas datetime tuples

我有一个元组列表,每个元组都有时间戳,我想得到最新的时间戳 - 元组的每个第一个位置的旧时间戳。

example_out put  = [(2038, A, [Timestamp('2010-01-24 00:00:00')- Timestamp('2010-02-20 00:00:00')]),(2038,B , [Timestamp('2017-01-24 00:00:00')- Timestamp('2017-02-20 00:00:00')])] It has to do for all the IDS

abc = [(2038, 'A', Timestamp('2010-01-24 00:00:00')),
(2038, 'A', Timestamp('2010-01-27 00:00:00')),
(2038, 'A', Timestamp('2010-01-30 00:00:00')),
(2038, 'A', Timestamp('2010-02-02 00:00:00')),
(2038, 'A', Timestamp('2010-02-06 00:00:00')),
(2038, 'A', Timestamp('2010-02-11 00:00:00')),
(2038, 'A', Timestamp('2010-02-18 00:00:00')),
(2038, 'A', Timestamp('2010-02-20 00:00:00')),
(2038, 'B', Timestamp('2017-01-24 00:00:00')),
(2038, 'B', Timestamp('2017-01-27 00:00:00')),
(2038, 'B', Timestamp('2017-01-30 00:00:00')),
(2038, 'B', Timestamp('2017-02-02 00:00:00')),
(2038, 'B', Timestamp('2017-02-06 00:00:00')),
(2038, 'B', Timestamp('2017-02-11 00:00:00')),
(2038, 'B', Timestamp('2017-02-18 00:00:00')),
(2038, 'B', Timestamp('2017-02-20 00:00:00')),
(2120, 'A', Timestamp('2010-01-24 00:00:00'))]    

这是正确的方法将所有ID放入列表然后计算最小和最大日期?

d = {}
l = []

    for r in abc:
        l.append(r)
        if r[0] not in d:
            d[r[0]] = r[1],[r[2]]

    print(d)

1 个答案:

答案 0 :(得分:2)

由于您已使用pandas,因此可以使用pd.DataFrame.groupby

res = pd.DataFrame(abc, columns=['Year', 'Category', 'Date'])\
        .groupby(['Year', 'Category'])['Date'].agg(lambda x: x.max() - x.min())\
        .reset_index()

print(res)

   Year Category    Date
0  2038        A 27 days
1  2038        B 27 days
2  2120        A  0 days