确定列表中的哪些数字加起来为指定值

时间:2014-04-02 03:15:23

标签: python math sum

我有一个快速的(希望会计问题。我刚刚进入一个新工作,书籍有点乱。书籍记录了这些一次性付款,而银行帐户列出了每一笔存款。我需要确定哪些存款属于账簿中的每笔存款。所以,我有这四笔一笔款项:

[6884.41,14382.14,2988.11,8501.60]

然后我有了这个更大的个人存款清单(已分类):

[98.56,98.56,98.56,129.44,160.0,242.19,286.87,290.0,351.01,665.0,675.0,675.0,677.45,677.45,695.0,695.0,695.0,695.0,715.0,720.0,725.0,730.0,745.0, 745.0,750.0,750.0,750.0,750.0,758.93,758.93,763.85,765.0,780.0,781.34,781.7,813.79,824.97,827.05,856.28,874.08,874.44,1498.11,1580.0,1600.0,1600.0]

在Python中,如何确定较长列表的哪个子集总和为一个总和值? (注意:这些数字有一个额外的问题,即一次性总和的总和比个人账户的总和多732.70美元。我希望这不会使这个问题完全无法解决)

1 个答案:

答案 0 :(得分:0)

这是一个很好的解决方案:

import datetime as dt
from itertools import groupby
from math import ceil

def _unique_subsets_which_sum_to(target, value_counts, max_sums, index):
    value, count = value_counts[index]
    if index:
        # more values to be considered; solve recursively
        index -= 1
        rem = max_sums[index]

        # find the minimum amount that this value must provide,
        # and the minimum occurrences that will satisfy that value
        if target <= rem:
            min_k = 0
        else:
            min_k = (target - rem + value - 1) // value  # rounded up to next int

        # find the maximum occurrences of this value
        # which result in <= target
        max_k = min(count, target // value)

        # iterate across min..max occurrences
        for k in range(min_k, max_k+1):
            new_target = target - k*value
            if new_target:
                # recurse
                for solution in _unique_subsets_which_sum_to(new_target, value_counts, max_sums, index):
                    yield ((solution + [(value, k)]) if k else solution)
            else:
                # perfect solution, no need to recurse further
                yield [(value, k)]
    else:
        # this must finish the solution
        if target % value == 0:
            yield [(value, target // value)]

def find_subsets_which_sum_to(target, values):
    """
    Find all unique subsets of values which sum to target

        target   integer >= 0, total to be summed to
        values   sequence of integer > 0, possible components of sum
    """
    # this function is basically a shell which prepares
    #  the input values for the recursive solution

    # turn sequence into sorted list
    values = sorted(values)
    value_sum = sum(values)

    if value_sum >= target:
        # count how many times each value appears
        value_counts = [(value, len(list(it))) for value,it in groupby(values)]

        # running total to each position
        total = 0
        max_sums = [0]
        for val,num in value_counts:
            total += val * num
            max_sums.append(total)

        start = dt.datetime.utcnow()
        for sol in _unique_subsets_which_sum_to(target, value_counts, max_sums, len(value_counts) - 1):
            yield sol
        end = dt.datetime.utcnow()
        elapsed = end - start
        seconds = elapsed.days * 86400 + elapsed.seconds + elapsed.microseconds * 0.000001
        print("  -> took {:0.1f} seconds.".format(seconds))

# I multiplied each value by 100 so that we can operate on integers
# instead of floating-point; this will eliminate any rounding errors.
values = [
    9856, 9856, 9856, 12944, 16000, 24219, 28687, 29000, 35101, 66500,
    67500, 67500, 67745, 67745, 69500, 69500, 69500, 69500, 71500, 72000,
    72500, 73000, 74500, 74500, 75000, 75000, 75000, 75000, 75893, 75893,
    76385, 76500, 78000, 78134, 78170, 81379, 82497, 82705, 85628, 87408,
    87444, 149811, 158000, 160000, 160000
]
sum_to = [
    298811,
    688441,
    850160  #,
            # 1438214
]

def main():
    subset_sums_to = []
    for target in sum_to:
        print("\nSolutions which sum to {}".format(target))
        res = list(find_subsets_which_sum_to(target, values))
        print("  {} solutions found".format(len(res)))
        subset_sums_to.append(res)
    return subset_sums_to

if __name__=="__main__":
    subsetsA, subsetsB, subsetsC = main()

在我的机器上导致

Solutions which sum to 298811
  -> took 0.1 seconds.
  2 solutions found

Solutions which sum to 688441
  -> took 89.8 seconds.
  1727 solutions found

Solutions which sum to 850160
  -> took 454.0 seconds.
  6578 solutions found

# Solutions which sum to 1438214
#   -> took 7225.2 seconds.
#   87215 solutions found

下一步是交叉比较解决方案子集,看看哪些可以共存。我认为最快的方法是存储最小的三个总和的子集,迭代它们(对于兼容的组合)找到剩余的值并将它们插入到最后一次总和的求解器中。


继续我离开的地方(对上面的代码进行一些更改,以获取前三个值的子列表的返回列表)。

我希望每次都能轻松获得剩余的值系数;

class NoNegativesDict(dict):
    def __sub__(self, other):
        if set(other) - set(self):
            raise ValueError
        else:
            res = NoNegativesDict()
            for key,sv in self.iteritems():
                ov = other.get(key, 0)
                if sv < ov:
                    raise ValueError
                # elif sv == ov:
                #     pass
                elif sv > ov:
                    res[key] = sv - ov
            return res

然后我将其应用为

value_counts = [(value, len(list(it))) for value,it in groupby(values)]
vc = NoNegativesDict(value_counts)
nna = [NoNegativesDict(a) for a in subsetsA]
nnb = [NoNegativesDict(b) for b in subsetsB]
nnc = [NoNegativesDict(c) for c in subsetsC]

# this is kind of ugly; with some more effort
# I could probably make it a recursive call also
b_tries = 0
c_tries = 0
sol_count = 0
start = dt.datetime.utcnow()

for a in nna:
    try:
        res_a = vc - a
        sa = str(a)

        for b in nnb:
            try:
                res_b = res_a - b
                b_tries += 1
                sb = str(b)

                for c in nnc:
                    try:
                        res_c = res_b - c
                        c_tries += 1

                        #unpack remaining values
                        res_values = [val for val,num in res_c.items() for i in range(num)]
                        for sol in find_subsets_which_sum_to(1438214, res_values):
                            sol_count += 1
                            print("\n================")
                            print("a =", sa)
                            print("b =", sb)
                            print("c =", str(c))
                            print("d =", str(sol))

                    except ValueError:
                        pass

            except ValueError:
                pass

    except ValueError:
        pass

print("{} solutions found in {} b-tries and {} c-tries".format(sol_count, b_tries, c_tries))
end = dt.datetime.utcnow()
elapsed = end - start
seconds = elapsed.days * 86400 + elapsed.seconds + elapsed.microseconds * 0.000001
print("  -> took {:0.1f} seconds.".format(seconds))

和最终输出:

0 solutions found in 1678 b-tries and 93098 c-tries
  -> took 73.0 seconds.

所以最终答案是您的给定数据没有解决方案

希望有所帮助; - )