我有一个pandas数据框,其中一列(LOG_TIMES)中有列表,如何创建一个包含列表元素时差(以秒为单位)的新列?
DATE_RECORDED PERSON LOG_TIMES
0 2018-03-22 11:58:23.585 JOHN [15/03/2018 10:30:48, 15/03/2018 10:29:48, ...
1 2018-03-22 11:58:23.585 JOHN [20/03/2018 14:28:36, 20/03/2018 14:26:36, ...
预期输出将是datafame,其中一列显示时差(以秒为单位的值):
DATE_RECORDED PERSON LOG_TIMES
0 2018-03-22 11:58:23.585 JOHN [60, ...
1 2018-03-22 11:58:23.585 JOHN [120, ...
答案 0 :(得分:1)
对于df
:
DATE_RECORDED PERSON LOG_TIMES
0 2018-03-22 11:58:23.585 JOHN [15/03/2018 10:30:48, 15/03/2018 10:29:48]
1 2018-03-22 11:58:23.585 JOHN [20/03/2018 14:28:36, 20/03/2018 14:26:36, 20/03/2018 14:26:30]
你需要:
df['LOG_TIMES'] = df['LOG_TIMES'].apply(lambda x: list(pd.Series([dt.datetime.strptime(y.strip(), '%d/%m/%Y %H:%M:%S') for y in x[1:-1].split(',')]).diff().astype('timedelta64[s]').dropna().mul(-1)))
输出:
DATE_RECORDED PERSON LOG_TIMES
0 2018-03-22 11:58:23.585 JOHN [60.0]
1 2018-03-22 11:58:23.585 JOHN [120.0, 6.0]
如果df['LOG_TIMES']
已经是datetime
个对象的列表,您只需使用:
df['LOG_TIMES'].apply(lambda x: list(pd.Series(x).diff().astype('timedelta64[s]').dropna().mul(-1)))