在列表中查找总和 - 优化的值

时间:2015-10-15 21:44:32

标签: python algorithm python-2.7 optimization

这是我今年早些时候提到的question的后续行动。我收到了以下问题的答案:

  

我试图编写简单和pythonic的内容来识别   列表中值与定义值之和的组合,   在一定的宽容范围内。

     

例如:

     

如果A = [0.4,2,3,1.4,2.6,6.3]并且目标值是5 +/- 0.5,那么   我想要的输出是(2,3),(1.4,2.6),(2,2.6),(0.4,2,3),(0.4,3,1.4)   如果没有找到任何组合,则该函数应返回0或   没有或类似的东西。

我在我的代码中实现了这个建议。但是,该方法(见下文)很快成为我的代码中的性能限制步骤。运行每次迭代都相当快,但它运行了很多次。

因此,我要向社区(比我更聪明的人)提供帮助。你能不能看到优化这个功能或用更快的东西替换它?

def findSum(self, dataArray, target, tolerance=0.5):
    for i in xrange(1, len(dataArray)+1):

        results = [list(comb) for comb in list(itertools.combinations(dataArray, i)) 
                   if target-tolerance < sum(map(float, comb)) < target+tolerance]

        if len(results) != 0:
            return results

2 个答案:

答案 0 :(得分:0)

似乎是Subset sum problem

根据维基百科,如果你在A中的所有元素都是正数,并且你搜索一个近似结果,那么这个算法:

initialize a list S to contain one element 0.
for each i from 1 to N do
   let T be a list consisting of xi + y, for all y in S
   let U be the union of T and S
   sort U
   make S empty 
   let y be the smallest element of U 
   add y to S 
   for each element z of U in increasing order do
      //trim the list by eliminating numbers close to one another
      //and throw out elements greater than s
      if y + cs/N < z ≤ s, set y = z and add z to S 
if S contains a number between (1 − c)s and s, output yes, otherwise no

应该给你答案。它具有多项式复杂性。

在伪python中(对不起,我有一段时间没有写过python)

def calc(A,s,c):
'''
 A is the list containing yours numbers
 s is your goal value
 c is your approximation ie +/- 0.5
'''
    S = [0]
    y = 0
    for i in xrange(1,N):
        T = [x+y  for x in A for y in S]
        U = list(set(T) | set(S))
        U.sort()
        S = []
        y = min(U)
        S = [y]
        for z in U:
            if y + cs/N < z ≤ s:
                y = z 
                S.append(z)
    return [ x for x in S , x>s-c , x<s+c ]     

答案 1 :(得分:0)

我在上一个问题中提供了递归函数的答案,并为此答案做了一些优化。它不仅比使用itertools.combinations()更快,还会返回正确答案。

import itertools
import timeit

def findSum(dataArray, target, tolerance=0.5):
    for i in xrange(1, len(dataArray)+1):

        results = [list(comb) for comb in list(itertools.combinations(dataArray, i))
                   if target-tolerance <= sum(map(float, comb)) <= target+tolerance]

        if len(results) != 0:
            return results

def recursive(dataArray, target, tolerance=0.5, i=0, possible=[]):
    results = []
    max_target = target + tolerance
    min_target = target - tolerance
    l = len(dataArray)
    while i < l:
        a = dataArray[i]
        i += 1
        if a > max_target:    # possible+[a] is too large
            break
        if a >= min_target:   # Found a set that works
            results.append(possible+[a])
        # recursively try with a shortened list dataArray and a reduced target
        result = recursive(dataArray, target-a, tolerance, i, possible+[a])
        if result:  results += result
    return results

dataArray = [0.4,2,3,1.4,2.6,6.3]
dataArray.sort()
target = 5

print findSum(dataArray, target)
print recursive(dataArray, target)

print timeit.Timer(lambda: findSum(dataArray, target)).timeit(number=100000)
print timeit.Timer(lambda: recursive(dataArray, target)).timeit(number=100000)

正如您在输出中看到的,您的函数只返回两个结果,但递归函数返回所有五个结果。对于100000次迭代,您的函数大约需要2秒,递归函数大约需要1.85秒。

[[2, 2.6], [2, 3]]
[[0.4, 1.4, 3], [0.4, 2, 2.6], [0.4, 2, 3], [2, 2.6], [2, 3]]
2.03791809082
1.84496808052

根据数据,递归函数可能更快,因为当排序数据不再适合目标范围时,它会退出循环。这要求在调用函数之前对dataArray进行排序。