数据框内列表的元素差异

时间:2018-06-07 19:34:49

标签: python pandas

我有一个pandas数据框,其中一列(LOG_TIMES)中有列表,如何创建一个包含列表元素时差(以秒为单位)的新列?

    DATE_RECORDED           PERSON  LOG_TIMES
0   2018-03-22 11:58:23.585 JOHN    [15/03/2018 10:30:48, 15/03/2018 10:29:48, ...
1   2018-03-22 11:58:23.585 JOHN    [20/03/2018 14:28:36, 20/03/2018 14:26:36, ...

预期输出将是datafame,其中一列显示时差(以秒为单位的值):

    DATE_RECORDED           PERSON  LOG_TIMES
0   2018-03-22 11:58:23.585 JOHN    [60, ...
1   2018-03-22 11:58:23.585 JOHN    [120, ...

1 个答案:

答案 0 :(得分:1)

对于df

                 DATE_RECORDED PERSON                                                        LOG_TIMES
0  2018-03-22 11:58:23.585   JOHN                       [15/03/2018 10:30:48, 15/03/2018 10:29:48]
1  2018-03-22 11:58:23.585   JOHN  [20/03/2018 14:28:36, 20/03/2018 14:26:36, 20/03/2018 14:26:30]

你需要:

df['LOG_TIMES'] = df['LOG_TIMES'].apply(lambda x: list(pd.Series([dt.datetime.strptime(y.strip(), '%d/%m/%Y %H:%M:%S') for y in x[1:-1].split(',')]).diff().astype('timedelta64[s]').dropna().mul(-1)))

输出:

    DATE_RECORDED PERSON       LOG_TIMES
0  2018-03-22 11:58:23.585   JOHN         [60.0]
1  2018-03-22 11:58:23.585   JOHN  [120.0, 6.0]

如果df['LOG_TIMES']已经是datetime个对象的列表,您只需使用:

df['LOG_TIMES'].apply(lambda x: list(pd.Series(x).diff().astype('timedelta64[s]').dropna().mul(-1)))