我有以下df:
user_id step date
1 start 2018-04-17 15:27:07
1 step1 2018-04-17 15:28:07
1 end 2018-04-17 15:29:07
2 start 2018-05-17 15:28:07
2 step1 2018-05-17 15:29:07
2 end 2018-05-17 15:30:07
,我需要将其转换为下表:
user_id start end time (end-start)
1 2018-04-17 15:27:07 2018-04-17 15:29:07 2
2 2018-05-17 15:28:07 2018-05-17 15:30:07 2
我被困在这一方面,任何帮助将不胜感激。
答案 0 :(得分:2)
您可以旋转并找到timedelta
new_df = df.pivot('user_id', 'step', 'date').drop('step1', 1).reset_index()
new_df.columns.name = None
new_df['time (end-start)'] = (new_df['end'] - new_df['start']).astype('timedelta64[m]')
user_id end start time (end-start)
0 1 2018-04-17 15:29:07 2018-04-17 15:27:07 2.0
1 2 2018-05-17 15:30:07 2018-05-17 15:28:07 2.0
编辑:对于具有重复条目的数据框,如下所示:
user_id step date
0 1 start 2018-04-17 15:27:07
1 1 step1 2018-04-17 15:28:07
2 1 end 2018-04-17 15:29:07
3 1 end 2018-04-17 15:32:07
4 2 start 2018-05-17 15:26:07
5 2 start 2018-05-17 15:28:07
6 2 step1 2018-05-17 15:29:07
7 2 end 2018-05-17 15:30:07
new_df = df.pivot_table(index = 'user_id', columns = 'step', values = 'date', aggfunc = 'first').drop('step1', 1).reset_index()
new_df.columns.name = None
new_df['time (end-start)'] = (new_df['end'] - new_df['start']).astype('timedelta64[m]')
你得到
user_id end start time (end-start)
0 1 2018-04-17 15:29:07 2018-04-17 15:27:07 2.0
1 2 2018-05-17 15:30:07 2018-05-17 15:26:07 4.0