我有几个for
循环,最里面的循环会被执行很多次。这个最里面的循环包含一些使用numpy的繁重计算,所以所有这些都需要花费很多时间。所以我试图优化最里面的循环。
我有两个numpy-arrays(在现实生活中要大得多):
left = np.asarray([0.4, 0.2, 0.2, 0.7, 0.6, 0.2, 0.3])
right= np.asarray([0.2, 0.7, 0.3, 0.2, 0.1, 0.9, 0.7])
将这些与阈值进行比较,以确定我是向左还是向右。如果left[x] > 0.55 and right[x] < 0.45
我想要离开。
如果left[x] < 0.55 and right[x] > 0.45
我想要正确的话。
我已经通过创建两个布尔数组来解决这个问题,一个用于左边,一个用于右边,根据:
leftListBool = ((left > 0.55)*1 + (right < 0.45)*1 - 1) > 0
rightListBool = ((right > 0.55)*1 + (left < 0.45)*1 - 1) > 0
上面的例子给了我:
leftListBool = [False False False True True False False]
rightListBool = [False True False False False True True]
但是,如果我最后一次离开,我不能离开(对于右边也是如此)。因此,我根据以下内容循环这些列表:
wentLeft = False
wentRight = False
a = 0
for idx, v in enumerate(leftListBool):
if leftListBool[idx] and not wentRight:
a += DoAThing(idx)
wentLeft = False
wentRight = True
elif rightListBool[idx] and not wentLeft:
a += DoAnotherThing(idx)
wentLeft = True
wentRight = False
DoAThing()
和DoAnotherThing()
只是从numpy-array中获取值。
就优化而言,这是我的目标(之前情况更糟)。请注意,我需要按正确的顺序执行DoAThing()
和DoAnotherThing()
,因为它们取决于之前的值。
我的第一个想法是创建一个leftListbool
和rightListBool
的统一列表,它看起来像(左= 1和右= -1):
unified = [0 1 0 -1 -1 1 1]
但我坚持以比以下更优化的方式做到这一点:
buyListBool.astype(int)-sellListBool.astype(int)
但即使我实现了这一点,我也只需要包含第一个值,例如我有两个1
相互跟随,这将导致:
unified = [0 1 0 -1 0 1 0]
在这种情况下,我可以将for循环减少为:
for i in unified:
if i == 1:
a += DoAThing(a)
elif i == -1:
a += DoAnotherThing(a)
但是即使这个for-loop也可以使用一些我尚未想到的numpy-magic进行优化。
start = time.time()
topLimit = 0.55
bottomLimit = 0.45
for outI in range(200):
for midI in range(200):
topLimit = 0.55
bottomLimit = 0.45
res = np.random.rand(200,3)
left = res[:,0]
right = res[:,1]
valList = res[:,2]
#These two statements can probably be optimized
leftListBool = ((left > topLimit)*1 + (right < bottomLimit)*1 - 1) > 0
rightListBool = ((right > topLimit)*1 + (left < bottomLimit)*1 - 1) > 0
wentLeft = False
wentRight = False
a=0
#Hopefully this loop can be optimized
for idx, v in enumerate(leftListBool):
if leftListBool[idx] and not wentRight:
a += valList[idx]
wentLeft = False
wentRight = True
elif rightListBool[idx] and not wentLeft:
a += valList[idx]
wentLeft = True
wentRight = False
end = time.time()
print(end - start)
答案 0 :(得分:1)
如果你需要循环你的序列而你关心性能,你不应该使用numpy.array
。当NumPy可以执行循环时,NumPy数组非常棒,但是如果你必须自己循环它会很慢(我在最近的另一个答案中详细说明为什么迭代数组的速度相当缓慢,如果你想看一下:{ {3}})。
您可以简单地使用tolist
和zip
来避免迭代的numpy-array开销:
import time
import numpy as np
start = time.time()
topLimit = 0.55
bottomLimit = 0.45
for outI in range(200):
for midI in range(200):
topLimit = 0.55
bottomLimit = 0.45
res = np.random.rand(200,2)
left = res[:,0].tolist() # tolist!
right = res[:,1].tolist() # tolist!
wentLeft = False
wentRight = False
a=0
for leftitem, rightitem in zip(left, right):
if leftitem > topLimit and rightitem < bottomLimit and not wentRight:
wentLeft, wentRight = False, True
elif rightitem > topLimit and leftitem < bottomLimit and not wentLeft:
wentLeft, wentRight = True, False
end = time.time()
print(end - start)
这使我的计算机的运行时间减少了30%。
您也可以稍后进行tolist
转换(可能会更快或更快):
start = time.time()
topLimit = 0.55
bottomLimit = 0.45
for outI in range(200):
for midI in range(200):
topLimit = 0.55
bottomLimit = 0.45
res = np.random.rand(200,2)
left = res[:,0]
right = res[:,1]
# use tolist after the comparisons
leftListBool = ((left > topLimit) & (right < bottomLimit)).tolist()
rightListBool = ((right > topLimit) & (left < bottomLimit)).tolist()
wentLeft = False
wentRight = False
a=0
#Hopefully this loop can be optimized
for idx in range(len(leftListBool)): # avoid direct iteration over an array
if leftListBool[idx] and not wentRight:
#a += DoAThing(a)
wentLeft = False
wentRight = True
elif rightListBool[idx] and not wentLeft:
#a += DoAnotherThing(a)
wentLeft = True
wentRight = False
end = time.time()
print(end - start)
这与其他方法一样快,但当left
和right
比200个元素大得多时,它可能会更快。
然而,这只是基于算法而不了解DoAThing
和DoAnotherThing
。您可以以允许向量化操作的方式构建它们(可以在不使用list
的情况下将其加速一个数量级)。但这更加困难,我不知道这些功能在做什么。
答案 1 :(得分:1)
根据更新后的问题,我将介绍一种对代码进行矢量化的方法:
import time
start = time.time()
topLimit = 0.55
bottomLimit = 0.45
for outI in range(200):
for midI in range(200):
topLimit = 0.55
bottomLimit = 0.45
res = np.random.rand(200,3)
left = res[:,0]
right = res[:,1]
valList = res[:,2]
# Arrays containing where to go left and when to go right
leftListBool = ((left > topLimit) & (right < bottomLimit))
rightListBool = ((right > topLimit) & (left < bottomLimit))
# Exclude all points that are neither right or left
common = leftListBool | rightListBool
valList = valList[common]
leftListBool = leftListBool[common]
rightListBool = rightListBool[common]
# Remove the values where you would go right or left multiple times in a row
leftListBool[1:] &= leftListBool[1:] ^ leftListBool[:-1]
rightListBool[1:] &= rightListBool[1:] ^ rightListBool[:-1]
valList = valList[leftListBool | rightListBool]
# Just use np.sum to calculate the sum of the remaining items
a = np.sum(valList)
end = time.time()
print(end - start)
内部循环是完全矢量化的,并且(在我的计算机上)方法比原始代码快3倍。如果我需要添加有关某些部分的更多说明,请告诉我。 ^
(xor运算符)只是np.diff
的一种更高效的方式,仅适用于布尔数组。