我有两个不同长度的大向量(~133000个值)。它们是从小到大的每个分类值。我想在给定的容差范围内找到相似的值。这是我的解决方案,但速度很慢。有没有办法加快速度呢?
import numpy as np
for lv in range(np.size(vector1)):
for lv_2 in range(np.size(vector2)):
if np.abs(vector1[lv_2]-vector2[lv])<.02:
print(vector1[lv_2],vector2[lv],lv,lv_2)
break
答案 0 :(得分:0)
您的算法远非最佳。你比较太多的价值观。假设您位于vector1
中的某个位置,vector2
中的当前值已超过0.02
。为什么要比较余下的vector2
?
从
开始pos1 = 0
pos2 = 0
现在比较矢量中这些位置的值。如果差异太大,请移动较小的位置,然后再次检查。继续,直到到达一个向量的末尾。
答案 1 :(得分:0)
尚未对其进行测试,但以下情况应该有效。这个想法是利用矢量分类的事实
lv_1, lv_2 = 0,0
while lv_1 < len(vector1) and lv_2 < len(vector2):
if np.abs(vector1[lv_2]-vector2[lv_1])<.02:
print(vector1[lv_2],vector2[lv_1],lv_1,lv_2)
lv_1 += 1
lv_2 += 1
elif vector1[lv_1] < vector2[lv_2]: lv_1 += 1
else: lv_2 += 1
答案 2 :(得分:0)
以下代码可以很好地提高性能,具体取决于数字的密集程度。使用一组1000个随机数,在0到100之间均匀采样,运行速度比实施速度快30倍。
pos_1_start = 0
for i in range(np.size(vector1)):
for j in range(pos1_start, np.size(vector2)):
if np.abs(vector1[i] - vector2[j]) < .02:
results1 += [(vector1[i], vector2[j], i, j)]
else:
if vector2[j] < vector1[i]:
pos1_start += 1
else:
break
时间安排:
time new method: 0.112464904785
time old method: 3.59720897675
由以下脚本生成:
import random
import numpy as np
import time
# initialize the vectors to be compared
vector1 = [random.uniform(0, 40) for i in range(1000)]
vector2 = [random.uniform(0, 40) for i in range(1000)]
vector1.sort()
vector2.sort()
# the arrays that will contain the results for the first method
results1 = []
# the arrays that will contain the results for the second method
results2 = []
pos1_start = 0
t_start = time.time()
for i in range(np.size(vector1)):
for j in range(pos1_start, np.size(vector2)):
if np.abs(vector1[i] - vector2[j]) < .02:
results1 += [(vector1[i], vector2[j], i, j)]
else:
if vector2[j] < vector1[i]:
pos1_start += 1
else:
break
t1 = time.time() - t_start
print "time new method:", t1
t = time.time()
for lv1 in range(np.size(vector1)):
for lv2 in range(np.size(vector2)):
if np.abs(vector1[lv1]-vector2[lv2])<.02:
results2 += [(vector1[lv1], vector2[lv2], lv1, lv2)]
t2 = time.time() - t_start
print "time old method:", t2
# sort the results
results1.sort()
results2.sort()
print np.allclose(results1, results2)