Question

我有两个大数字列表（每个数字可能一百万个元素）。我想对两个元素进行逐元素比较，以识别差异小于0.5的元素对。我知道两个嵌套的for循环不是一个选择。是否有任何快速的方法使用集或zip来做到这一点？

例如如果我的列表是@StreamListener和list1 = [1,2,3,4]并且条件是1，则解决方案将这些对安排在列表中[元素来自list2 = [3,4,5,6]，元素来自list1 ，区别]。解决方案将是list2

谢谢

Answer 1

这应该有效。（赞赏）

基本上，我的想法是对两个列表O（nlogn）进行排序，然后遍历该列表，在内存中与下一个元素保持距离，因此，不计算所有对，而是仅计算给我的一个子集O（2 * m * n）m是允许的最大距离

x = sorted([0, 2, 3, 4])
y = sorted([1,3, 4, 5, 6])
index = 0
delta = 1
output = []
j = 0
value_2 = y[0]
no_more = False
number_of_operation = 0
for i,value_1 in enumerate(x[:]):
    print(f'Testing for this {value_1}')
    skip = False
    try:
        next_value_at = x[i+1] - value_1 
        if next_value_at > delta:
            skip = True
            print('We can directly skip to next')
    except:
        print('At the end of list')
    while value_2 - value_1 <= delta:
        number_of_operation+=1
        print(value_1,value_2)
        try:
            if abs(value_1 - value_2) <= delta:
                output += [[value_1,value_2,value_1-value_2]]
            j+=1
            value_2 = y[j]
            print(value_1,value_2) 
            continue
        except:
            no_more = True
            print('end of list')
            break
    if not skip:
        print("Going back")
        j=index
        value_2 = y[index]
    else:
        index = j
    if no_more:
        print('end')
        break
    print(number_of_operation)

Answer 2

使用numpy的广播

files = dir('C:\myfolder\*.txt');
for k = 1:length(files)
    load(files(k).name, '-ascii')
end

但是，您不能避免N ^ 2比较，而只能利用numpy的速度优化。

Answer 3

如果您首先对列表进行排序（或者如果列表已经排序，则可能会避免O（N²）行为）。然后，您可以在元素方面逐步解决它们。这将为您提供O（nLogn）进行排序，再加上O（n）即可遍历元素。例如：

excerpt_body

生产...

999998 999999 1
  999999 999999 0
  999999 1000000 1

...比首先找到产品要快得多。

比较两个大列表以查找满足python中条件的元素对

3 个答案:

这应该有效。（赞赏）

比较两个大列表以查找满足python中条件的元素对

3 个答案:

这应该有效。 （赞赏）

这应该有效。（赞赏）