我尝试在CSV中减去两列以创建第三列"持续时间" 结束时间 - Start_time
每一行也对应一个用户ID。
我可以使用Duration列创建一个csv文件,但我宁愿将其重定向回原来的csv。
这些时间的格式如下: 2016-11-12 01:25:24 + 00 - 2016-11-12 01:25:20 + 00
到目前为止,我已经完成了这个
start_stop_sessions = pd.read_csv("start_stop_sessions.csv", parse_dates
['time_x', 'time_y'])
start_stop_sessions['time_delta'] = start_stop_sessions.time_y.values -
start_stop_sessions.time_x.values
Duration = (start_stop_sessions.time_delta)
print (Duration)
sys.stdout = open('Duration.csv', 'w')
Durationlist = ("Duration.csv")
max_value = max(Durationlist)
min_value = min(Durationlist)
我这样做了吗?
测试数据
time_x, anonymous_id, time_y
2016-11-20 18:35:57+00, 1, 2016-11-20 19:03:31+00
2016-11-21 19:33:06+, 2, 2016-11-21 19:45:47+00
2016-11-21 19:22:52+00, 3, 2016-11-21 19:26:02+00
1)我需要创建第4列持续时间
2)此持续时间列的MIN,MAX,AVG列表
答案 0 :(得分:2)
我认为您需要to_csv
才能将文件写入csv
:
df = pd.read_csv("start_stop_sessions.csv", parse_dates=['time_x','time_y'])
df['Duration'] = df['time_y'] - df['time_x']
#same as
#df['Duration'] = df['time_y'].sub(df['time_x'])
print (df)
time_x anonymous_id time_y Duration
0 2016-11-20 18:35:57 1 2016-11-20 19:03:31 00:27:34
1 2016-11-21 19:33:06 2 2016-11-21 19:45:47 00:12:41
2 2016-11-21 19:22:52 3 2016-11-21 19:26:02 00:03:10
df.to_csv('start_stop_sessions.csv', index=False)
然后获取Duration
列的min
,max
和mean
- 输出为timedelta
:
print (df['Duration'].min())
0 days 00:03:10
print (df['Duration'].max())
0 days 00:27:34
print (df['Duration'].mean())
0 days 00:14:28.333333
如果需要将timedelta
转换为秒需要total_seconds
:
df['Duration'] = (df['time_y'] - df['time_x']).dt.total_seconds()
print (df)
time_x anonymous_id time_y Duration
0 2016-11-20 18:35:57 1 2016-11-20 19:03:31 1654.0
1 2016-11-21 19:33:06 2 2016-11-21 19:45:47 761.0
2 2016-11-21 19:22:52 3 2016-11-21 19:26:02 190.0
df.to_csv('start_stop_sessions.csv', index=False)
print (df['Duration'].min())
190.0
print (df['Duration'].max())
1654.0
print (df['Duration'].mean())
868.3333333333334