我必须完成大约一百万次操作:
"Runtime": "01:12:00" --> datetime.time(1,12)
最有效的方法是什么?现在我只是对分号进行拆分,并做了datetime.time(...)
-
s = '01:12:00'
h,m,s = [int(i) for i in s.split(':')
st = datetime.time(hour=h, minute=m, second=s)
答案 0 :(得分:4)
使用timeit
模块,您可以自己测试不同的实现:
import datetime
import re
PAT = re.compile('(\d{2}):(\d{2}):(\d{2})')
TSTR = "01:12:00"
def fun1():
dt = datetime.datetime.strptime(TSTR, "%H:%M:%S")
return dt
def fun2():
h,m,s = [int(i) for i in TSTR.split(':')]
dt = datetime.time(hour=h, minute=m, second=s)
return dt
def fun3():
mat = PAT.match(TSTR)
dt = datetime.time(hour=int(mat.group(1)), minute=int(mat.group(2)), second=int(mat.group(3)))
return dt
def fun4():
h,m,s = int(TSTR[0:2]), int(TSTR[3:5]), int(TSTR[6:8])
dt = datetime.time(hour=h, minute=m, second=s)
return dt
if __name__ == "__main__":
import timeit
# Use the default repeat arguments: repeat=3, number=1000000
print(min(timeit.repeat("fun1()", setup="from __main__ import fun1"))) # 15.5739
print(min(timeit.repeat("fun2()", setup="from __main__ import fun2"))) # 3.4544
print(min(timeit.repeat("fun3()", setup="from __main__ import fun3"))) # 4.1829
print(min(timeit.repeat("fun4()", setup="from __main__ import fun4"))) # 2.8675
最快的方法是fun4
。接下来是split
方法,紧接着(令人惊讶的是,imo)采用正则表达式方法,并且远远落后于strptime
方法。
答案 1 :(得分:2)
>>> import time
>>> a='01:12:00'
>>> b=time.strptime(a,'%H:%M:%S') # use %I instead of %H if you use 12-hour clock
>>> b
time.struct_time(tm_year=1900, tm_mon=1, tm_mday=1, tm_hour=1, tm_min=12, tm_sec=0, tm_wday=0, tm_yday=1, tm_isdst=-1)
然后使用b.tm_hour
,b.tm_min
和b.tm_sec
获取小时,分钟和秒。
答案 2 :(得分:2)
In [48]: s = '"Runtime": "01:12:00"'
In [49]: dt.strptime(s, '"Runtime": "%H:%M:%S"')
Out[49]: datetime.datetime(1900, 1, 1, 1, 12)
答案 3 :(得分:1)
我分析了正则表达式方法的性能,string.split到数组方法和OP的方法
看起来分裂到阵列的速度比正则表达式快约38%,比OP的方法快15%左右。
import time
import re
import datetime
timestring = "01:12:00"
# STRING.split method, stored temporarily in array
beforeMillis = int(round(time.time() * 1000))
for i in range(10000):
result = re.search(r"(\d{2}):(\d{2}):(\d{2})", timestring).groups()
theTime = datetime.time(int(result[0]), int(result[1]), int(result[2]))
afterMillis = int(round(time.time() * 1000))
print "Using Regex: " + str(afterMillis - beforeMillis) + "ms"
# regex method
beforeMillis = int(round(time.time() * 1000))
for i in range(10000):
result = timestring.split(":")
theTime = datetime.time(int(result[0]), int(result[1]), int(result[2]))
afterMillis = int(round(time.time() * 1000))
print "Using Split: " + str(afterMillis - beforeMillis) + "ms"
# STRING.split method, stored temporarily in three variables
beforeMillis = int(round(time.time() * 1000))
for i in range(10000):
h,m,s = [int(i) for i in timestring.split(':')]
theTime = datetime.time(hour=h, minute=m, second=s)
afterMillis = int(round(time.time() * 1000))
print "Using Split with 3 Variables: " + str(afterMillis - beforeMillis) + "ms"
输出:
$ python test.py
Using Regex: 52ms
Using Split: 34ms
Using Split with 3 Variables: 44ms
我认为你找不到比在数组中存储拆分字符串更快的方法。
暂时存储数组比三个变量快一点,原因很简单:不需要再使用内存,编译器可以更容易地优化它。
所有其他答案(推荐正则表达式除外)也无法使用datetime.time。
我建议你不要使用内置的time
对象,因为它代表一个unix时间(1970年1月1日以来的秒数),而不是一天中的时间。