比较python中的大向量

时间:2012-10-10 08:49:44

标签: python vector

我有两个不同长度的大向量(~133000个值)。它们是从小到大的每个分类值。我想在给定的容差范围内找到相似的值。这是我的解决方案,但速度很慢。有没有办法加快速度呢?

import numpy as np

for lv in range(np.size(vector1)):
    for lv_2 in range(np.size(vector2)):
        if np.abs(vector1[lv_2]-vector2[lv])<.02: 
            print(vector1[lv_2],vector2[lv],lv,lv_2)
            break

3 个答案:

答案 0 :(得分:0)

您的算法远非最佳。你比较太多的价值观。假设您位于vector1中的某个位置,vector2中的当前值已超过0.02。为什么要比较余下的vector2

开始
pos1 = 0
pos2 = 0

现在比较矢量中这些位置的值。如果差异太大,请移动较小的位置,然后再次检查。继续,直到到达一个向量的末尾。

答案 1 :(得分:0)

尚未对其进行测试,但以下情况应该有效。这个想法是利用矢量分类的事实

lv_1, lv_2 = 0,0
while lv_1 < len(vector1) and lv_2 < len(vector2):
  if np.abs(vector1[lv_2]-vector2[lv_1])<.02:
     print(vector1[lv_2],vector2[lv_1],lv_1,lv_2)
     lv_1 += 1
     lv_2 += 1
  elif vector1[lv_1] < vector2[lv_2]: lv_1 += 1
  else: lv_2 += 1

答案 2 :(得分:0)

以下代码可以很好地提高性能,具体取决于数字的密集程度。使用一组1000个随机数,在0到100之间均匀采样,运行速度比实施速度快30倍。

pos_1_start = 0

for i in range(np.size(vector1)):
    for j in range(pos1_start, np.size(vector2)):
        if np.abs(vector1[i] - vector2[j]) < .02:
            results1 += [(vector1[i], vector2[j], i, j)]
        else:
            if vector2[j] < vector1[i]:
                pos1_start += 1
            else:
                break

时间安排:

time new method: 0.112464904785
time old method: 3.59720897675

由以下脚本生成:

import random
import numpy as np
import time

# initialize the vectors to be compared
vector1 = [random.uniform(0, 40) for i in range(1000)]
vector2 = [random.uniform(0, 40) for i in range(1000)]

vector1.sort()
vector2.sort()

# the arrays that will contain the results for the first method
results1 = []

# the arrays that will contain the results for the second method
results2 = []

pos1_start = 0

t_start = time.time()
for i in range(np.size(vector1)):
    for j in range(pos1_start, np.size(vector2)):
        if np.abs(vector1[i] - vector2[j]) < .02:
            results1 += [(vector1[i], vector2[j], i, j)]
        else:
            if vector2[j] < vector1[i]:
                pos1_start += 1
            else:
                break

t1 = time.time() - t_start
print "time new method:", t1

t = time.time()
for lv1 in range(np.size(vector1)):
    for lv2 in range(np.size(vector2)):
        if np.abs(vector1[lv1]-vector2[lv2])<.02: 
            results2 += [(vector1[lv1], vector2[lv2], lv1, lv2)]
t2 = time.time() - t_start

print "time old method:", t2
# sort the results

results1.sort()
results2.sort()

print np.allclose(results1, results2)