我正在尝试创建一个遍历数组并创建新数组的函数。 Usint timeit我发现最慢的部分是在numpy数组上循环。 由于我用作输入的数组往往很长,因此我想尽可能地加快速度。
有没有办法使列表理解循环更快? 我提供了一个重新创建问题的功能:
def get_days(year, month):
months=np.array([31,28,31,30,31,30,31,31,30,31,30,31])
if month==2:
if (year%4==0 and year%100!=0) or (year%400==0):
return 29
return months[month-1]
此数组需要产生更好的性能:
res=np.arange(20788, 20940)
np.array([np.min([x+datetime.fromtimestamp(20809*24*60*60).day-1, x+get_days(datetime.fromtimestamp(20809*24*60*60).year, datetime.fromtimestamp(20809*24*60*60).month)]) for x in res])
答案 0 :(得分:2)
不是使用列表理解和循环,而是使用numpy函数和矢量化。
b = np.array([np.min([x+datetime.fromtimestamp(20809*24*60*60).day-1,
x+get_days(datetime.fromtimestamp(20809*24*60*60).year,
datetime.fromtimestamp(20809*24*60*60).month)])
for x in res])
c = np.minimum(res+datetime.fromtimestamp(20809*24*60*60).day-1,
res+get_days(datetime.fromtimestamp(20809*24*60*60).year,
datetime.fromtimestamp(20809*24*60*60).month))
b == c
输出:
array([ True, True, True, True, True, True, True, True, True,
True, True, True, True, True, True, True, True, True,
True, True, True, True, True, True, True, True, True,
True, True, True, True, True, True, True, True, True,
True, True, True, True, True, True, True, True, True,
True, True, True, True, True, True, True, True, True,
True, True, True, True, True, True, True, True, True,
True, True, True, True, True, True, True, True, True,
True, True, True, True, True, True, True, True, True,
True, True, True, True, True, True, True, True, True,
True, True, True, True, True, True, True, True, True,
True, True, True, True, True, True, True, True, True,
True, True, True, True, True, True, True, True, True,
True, True, True, True, True, True, True, True, True,
True, True, True, True, True, True, True, True, True,
True, True, True, True, True, True, True, True, True,
True, True, True, True, True, True, True, True])
%timeit b = np.array([np.min([x+datetime.fromtimestamp(20809*24*60*60).day-1, x+get_days(datetime.fromtimestamp(20809*24*60*60).year, datetime.fromtimestamp(20809*24*60*60).month)]) for x in res])
每个循环1.99 ms±33.4 µs(平均±标准偏差,共运行7次,每个循环100个循环)
%timeit c = np.minimum(res+datetime.fromtimestamp(20809*24*60*60).day-1, res+get_days(datetime.fromtimestamp(20809*24*60*60).year, datetime.fromtimestamp(20809*24*60*60).month))
每个循环10.5 µs±310 ns(平均±标准偏差,共运行7次,每个循环100000次)
答案 1 :(得分:0)
@botje评论。请注意,只要您在列表解析中调用该函数,便会有一些var。当我在函数外部声明这些var时,我设法使其变得更快。我的代码如下:
import numpy as np
from datetime import datetime
from helpers.time_dec import calc_execution_time
months=np.array([31,28,31,30,31,30,31,31,30,31,30,31])
dt = datetime.fromtimestamp(20809 * 24 * 60 * 60)
dt_day = dt.day
def get_days(year, month):
if month==2:
if (year%4==0 and year%100!=0) or (year%400==0):
return 29
return months[month-1]
d = get_days(dt.year, dt.month)
@calc_execution_time
def calc():
res = np.arange(20788, 20940)
r = np.array([np.min([x + dt_day - 1,
x +d]) for x in res])
return r
print(calc()) # 0.0011 seconds, and your code showed 0.0026 seconds. So obviously the Performance is better now
################### this is the test exectution time function ###############
from timeit import default_timer
def calc_execution_time(func):
"""calculate execution Time of a function"""
def wrapper(*args, **kwargs):
before = default_timer()
res = func(*args, **kwargs)
after = default_timer()
execution_time = after - before
print(f"execution time of the Function {func.__qualname__} is :=> {execution_time} seconds")
return res
return wrapper
您也可以使用地图功能。我不是您的目标,但我认为您可以将函数更改为使用map而不是列表理解,它会返回一个生成器对象,因此代码如下所示:
@calc_execution_time
def calc():
res = np.arange(20788, 20940)
#r = np.array([np.min([x + dt_day - 1, x +d]) for x in res])
r = map(lambda x: np.min([x + dt_day - 1, x +d]), res)
return r
print(list(calc())) # 1.65 e-05 seconds