我正在使用脚本将停止时间从HH:MM:SS格式插入到微小的int值中。脚本如下。
# read in new csv file
reindexed = pd.read_csv('output/stop_times.csv')
for col in ('arrival_time', 'departure_time'):
# extract hh:mm:ss values
df = reindexed[col].str.extract(
r'(?P<hour>\d+):(?P<minute>\d+):(?P<second>\d+)').astype('float')
# convert to int value
reindexed[col] = df['hour'] * 60 + df['minute']
# interpolate
reindexed[col] = reindexed[col].interpolate()
reindexed[col] = np.round(reindexed[col], decimals=2)
reindexed.to_csv('output/stop_times.csv', index=False)
# convert minutes back to HH:MM:SS
我现在想要的是将这些值转换回HH:MM:SS格式,但我很难搞清楚。我有一种预感,这个方法隐藏在timeseries documentation的某个地方,但我的结果还不够。
以下是从我正在使用的较大的stop_times.csv文件派生的示例CSV。 arrival_time 和 departure_time 列是重点:
stop_id,stop_code,stop_name,stop_desc,stop_lat,stop_lon,location_type,parent_station,trip_id,arrival_time,departure_time,stop_sequence,pickup_type,drop_off_type,stop_headsign
02303,02303,LCC Station Bay C,lcc_c,44.00981229999999,-123.0351463,,99994.0,1475360,707.0,707.0,1,0,0,82 EUGENE STATION
01092,01092,N/S of 30th E of University,,44.0242826,-123.07484540000002,,,1475360,709.67,709.67,2,0,0,82 EUGENE STATION
01089,01089,N/S of 30th W of Alder,,44.0242545,-123.08092409999999,,,1475360,712.33,712.33,3,0,0,82 EUGENE STATION
01409,01409,"Amazon Station, Bay A",amz_a,44.026660799999995,-123.08448870000001,,99993.0,1475360,715.0,715.0,4,0,0,82 EUGENE STATION
01222,01222,E/S of Amazon Prkwy N of 24th,,44.0339371,-123.0887632,,,1475360,715.75,715.75,5,0,0,82 EUGENE STATION
01548,01548,E/S of Amazon Pkwy S of 19th,,44.038014700000005,-123.0896553,,,1475360,716.5,716.5,6,0,0,82 EUGENE STATION
以下是从分钟的时间值中导出HH:MM:SS值的参考:
78.6 minutes can be converted to hours by dividing 78.6 minutes / 60 minutes/hour = 1.31 hours
1.31 hours can be broken down to 1 hour plus 0.31 hours - 1 hour
0.31 hours * 60 minutes/hour = 18.6 minutes - 18 minutes
0.6 minutes * 60 seconds/minute = 36 seconds - 36 seconds
非常感谢任何帮助。提前谢谢!
答案 0 :(得分:3)
Per the previous question 最好的办法是保留原来的HH:MM:SS字符串:
所以而不是
for col in ('arrival_time', 'departure_time'):
df = reindexed[col].str.extract(
r'(?P<hour>\d+):(?P<minute>\d+):(?P<second>\d+)').astype('float')
reindexed[col] = df['hour'] * 60 + df['minute']
使用
for col in ('arrival_time', 'departure_time'):
newcol = '{}_minutes'.format(col)
df = reindexed[col].str.extract(
r'(?P<hour>\d+):(?P<minute>\d+):(?P<second>\d+)').astype('float')
reindexed[newcol] = df['hour'] * 60 + df['minute']
然后您不必执行任何新计算来恢复HH:MM:SS字符串。
reindexed['arrival_time']
仍然是原始的HH:MM:SS字符串,和
reindexed['arrival_time_minutes']
将是以分钟为单位的持续时间。
以Jianxun Li's solution为基础,
要切断微秒,你可以将分钟乘以60,然后拨打astype(int)
:
import numpy as np
import pandas as pd
np.random.seed(0)
df = pd.DataFrame(np.random.rand(3) * 1000, columns=['minutes'])
df['HH:MM:SS'] = pd.to_timedelta((60*df['minutes']).astype('int'), unit='s')
产生
minutes HH:MM:SS
0 548.813504 09:08:48
1 715.189366 11:55:11
2 602.763376 10:02:45
请注意,df['HH:MM:SS']
列包含pd.Timedelta
s:
In [240]: df['HH:MM:SS'].iloc[0]
Out[240]: Timedelta('0 days 09:08:48')
但是,如果您尝试将此数据存储在csv
中In [223]: df.to_csv('/tmp/out', date_format='%H:%M:%S')
你得到:
,minutes,HH:MM:SS
0,548.813503927,0 days 09:08:48.000000000
1,715.189366372,0 days 11:55:11.000000000
2,602.763376072,0 days 10:02:45.000000000
如果分钟值太大,您还会看到days
作为timedelta字符串表示的一部分:
np.random.seed(0)
df = pd.DataFrame(np.random.rand(3) * 10000, columns=['minutes'])
df['HH:MM:SS'] = pd.to_timedelta((60*df['minutes']).astype('int'), unit='s')
产量
minutes HH:MM:SS
0 5488.135039 3 days 19:28:08
1 7151.893664 4 days 23:11:53
2 6027.633761 4 days 04:27:38
这可能不是你想要的。在这种情况下,而不是
df['HH:MM:SS'] = pd.to_timedelta((60*df['minutes']).astype('int'), unit='s')
每Phillip Cloud's solution你可以使用
import operator
fmt = operator.methodcaller('strftime', '%H:%M:%S')
df['HH:MM:SS'] = pd.to_datetime(df['minutes'], unit='m').map(fmt)
结果看起来相同,但现在df['HH:MM:SS']
列包含字符串
In [244]: df['HH:MM:SS'].iloc[0]
Out[244]: '09:08:48'
请注意,这会切断(省略)整天和微秒。 将DataFrame写入CSV
In [229]: df.to_csv('/tmp/out', date_format='%H:%M:%S')
现在收益
,minutes,HH:MM:SS
0,548.813503927,09:08:48
1,715.189366372,11:55:11
2,602.763376072,10:02:45
答案 1 :(得分:2)
您可能需要考虑使用pd.to_timedelta
。
import pandas as pd
import numpy as np
np.random.seed(0)
df = pd.DataFrame(np.random.rand(10) * 1000, columns=['time_in_minutes'])
Out[94]:
time_in_minutes
0 548.8135
1 715.1894
2 602.7634
3 544.8832
4 423.6548
5 645.8941
6 437.5872
7 891.7730
8 963.6628
9 383.4415
# As Jeff suggests, pd.to_timedelta is a very handy tool to do this
df['time_delta'] = pd.to_timedelta(df.time_in_minutes, unit='m')
Out[96]:
time_in_minutes time_delta
0 548.8135 09:08:48.810235
1 715.1894 11:55:11.361982
2 602.7634 10:02:45.802564
3 544.8832 09:04:52.990979
4 423.6548 07:03:39.287960
5 645.8941 10:45:53.646784
6 437.5872 07:17:35.232675
7 891.7730 14:51:46.380046
8 963.6628 16:03:39.765630
9 383.4415 06:23:26.491129