查找两个列表之间的相似性

时间:2019-07-02 20:42:19

标签: python performance

我有两个数字列表(L1和L2)。我必须找出L1数字总和的任何组合是否在L2数字总和的任何组合中。

我尝试通过powerset()函数进行双循环。但是它很慢。

powerset()生成器:technomancy.org/python/powerset-generator-python。

我不发布代码,因为我需要的是一些想法,方法或任何可以启发我的东西。

额外的问题:ListA的范围可能多达100个以上的元素(与B相同)

5 个答案:

答案 0 :(得分:2)

这是动态编程方法。如果您有整数,它将很好地工作。这样做的好处是,您只跟踪一种方法来获取任何特定的总和,这意味着您的表现受总和数的限制。

def all_sums (numbers):
    answer = {0: None}
    for n in numbers:
        next_answer = {}
        for s, path in answer.iteritems():
            next_answer[s] = path
            next_answer[round(s + n, 8)] = [n, path]
        answer = next_answer
    if answer[0] is None:
        answer.pop(0)
    return answer

def find_matching_sum (numbers1, numbers2):
    sums1 = all_sums(numbers1)
    sums2 = all_sums(numbers2)
    for s1, path1 in sums1.iteritems():
        if s1 in sums2:
            return [s1, path1, sums2[s1]]
    return None

listA = [455, 698, 756, 3.56, -9]

listB = [526,55,943,156,531,304,618,911,598,498,268,926,899,898,131,966,303,936,509,67,976,639,74,935,23,226,422,280,64,975,583,596,583]
print(find_matching_sum(listA, listB))

对于浮点数,我建议尝试乘以一个公分母来获得整数。这是为了处理0.1 + 0.2 != 0.3问题。还应注意,使用浮点数很容易获得大量可能的和,因此动态编程方法不再是一个成功。举一个极端的例子,考虑[..., 8, 4, 2, 1, 0.5, 0.25, 0.125, ...],现在整个powerset都可以播放了……

答案 1 :(得分:1)

如果您仍处于可以生成完整功率集的区域(并且我们不必尝试解决该问题),则可以简单地对功率集进行排序(按其总和的值)并在其中进行比较订单,就像在mergesort中一样。这样可以将运行时间从O(2^N * 2*M)减少到O(2^N + 2^M),虽然仍然不太好,但是确实可以将有效问题的大小从O(N+M)减少到O(max(N,M)

答案 2 :(得分:1)

要获得公共和(而不是组成它的元素),当您的数字产生多余的和时,此算法将很快给出答案:

def commonSum(A,B):
    commonSet = set(A).intersection(B) # common values are common sums
    if commonSet: return min(commonSet)   

    maxSumA = sum([a for a in A if a>0] or [max(A)])
    maxSumB = sum([b for b in B if b>0] or [max(B)])
    minSumA = sum([a for a in A if a<0] or [min(A)])
    minSumB = sum([b for b in B if b<0] or [min(B)])
    if maxSumA < minSumB: return None # no possible common sums
    if maxSumB < minSumA: return None

    sumsA,sumsB   = set(),set()                 # sets of cumulative sums
    diffA,diffB   = set([sum(A)]),set([sum(B)]) # sets of cumulative differences from total
    iterA,iterB   = iter(sorted(A,key=abs)),iter(sorted(B,key=abs))
    valueA,valueB = next(iterA,None), next(iterB,None)
    while valueA is not None or valueB is not None: # traverse the two lists in parallel until a sum if found or both are exhausted
        if valueA is not None:            
            newSums  = [valueA+s for s in sumsA \
                        if valueA+s <= maxSumB and valueA+s not in sumsA] + [valueA]
            sumsA.update(newSums) # new sums formed by combining element with all current ones
            newDiffs = [d-valueA for d in diffA \
                        if d-valueA >= minSumB and d-valueA not in diffA]
            diffA.update(newDiffs) # new differences formed by combining current element with existing ones
            valueA = next(iterA,None)
        if valueB is not None:
            newSums = [valueB+s for s in sumsB \
                       if valueB+s <= maxSumA and valueB+s not in sumsB] + [valueB]
            sumsB.update(newSums) # new sums formed by combining element with all current ones
            newDiffs = [d-valueB for d in diffB \
                        if d-valueB >= minSumA and d-valueB not in diffB]
            diffB.update(newDiffs) # new differences formed by combining current element with existing ones
            valueB = next(iterB,None)
        commonSet = (diffA & diffB) | (sumsA & sumsB) # detect common sums
        if commonSet: return min(commonSet)

使用提供的示例以及随机值集的结果始终很好。

listA = [455, 698, 756, 3.56, -9]
listB = [526,55,943,156,531,304,618,911,598,498,268,926,899,898,131,966,303,936,509,67,976,639,74,935,23,226,422,280,64,975,583,596,583]

commonSum(listA,listB) # 446,  1.1 millisecond

listA = [ random.randint(-1000,1000)*2-1 for _ in range(50) ] # only odd numbers
listB = [ random.randint(-500,1500)*2 for _ in range(50) ]    # only even numbers

commonSum(listA,listB) # 0.1 to 2 milliseconds

该函数经过优化,可以快速消除不能具有公共和的排他列表,并在有公共和的情况下尽快找到该公共和。如果没有共同的总和,确实会浪费一些时间进行重复检查(但是如果需要的话可以进一步优化)

它的工作方式是逐步构建可能的和不重复的集合(在sumsA,sumsB中)。这样可以最大程度地减少要比较的列表的大小。它允许更直接地确定公共和(设置交集sumsA & sumsB)。列表以值大小(绝对值)的升序处理,以便更早地考虑较小的增量,并最大化达到给定总和的机会。同时,利用累积和,该算法使用另外两个集合(diffA,diffB)检查累积差异。这些是所有元素的总和减去到目前为止已处理的元素。这两个集合(diffA & diffB)的交集也有助于快速找到一个公共和。

当值彼此相对接近(即小方差)时,该函数运行良好,但在最坏的情况下会大大降低其速度。通常适用于值域遵循正态分布的应用。

要获得数字的实际组合,可以使用commonSum()函数将问题减少到找到列表中产生已知和(也称为子集和)的元素组合。因为我们知道列表中有解决方案需要分析,所以通常可以很快找到该子集。

这是一个查找列表元素的函数,这些元素的总和为已知值:

from itertools import combinations,islice
def findSum(S,A,sumA=None,maxSumA=None,minSumA=None,lenA=None,pairs=None):
    if sumA is None: # sort and prepare accumulators (once)
        A,lenA  = sorted(A),len(A)
        pairs   = dict()
        for a,b in combinations(reversed(A),2):  # a+b pairs with lowest a where a >= b
            if a+b == S: return [b,a]
            pairs[a+b] = [a,b] 
        sumA    = sum(A)
        maxSumA = sum([a for a in A if a>0] or [max(A)])
        minSumA = sum([a for a in A if a<0] or [min(A)])
    # exit conditions
    if lenA == 0: return None                    # no more elements to sum
    if S < minSumA or S > maxSumA: return None   # sum unreachable
    if sumA == S: return A[:lenA]                # all values form the sum
    if S in islice(A,lenA): return [S]           # sum is one of the values (avoids creating temp list)
    if lenA == 1: return None                    # if only one, it must be the one
    if S in pairs \
    and all(p<=a for p,a in zip(pairs[S],A[lenA-2:])): # quick pairs
        return pairs[S]
    if lenA == 2 : return None                   # if only two, pairs would catch it
    for a in islice(A,lenA):
        if S-a not in pairs: continue            # quick triples
        triple = sorted([a]+pairs[S-a])
        if all(triple.count(n)==A[:lenA].count(n) for n in triple):
            return triple
    if lenA == 3: return None
    # isolate maximum value, update accumulators
    maxA     = A[lenA-1]
    sumA    -= maxA
    maxSumA -= maxA
    minSumA -= min(0,maxA)
    if maxSumA <= 0: maxSumA = maxA
    # include max in result and recurse
    usingMax = findSum(S-maxA,A,sumA,maxSumA,minSumA,lenA-1,pairs)  
    if usingMax : return usingMax + [maxA]
    # exclude max from result and recurse
    return findSum(S,A,sumA,maxSumA,minSumA,lenA-1,pairs) 

您可以使用commonSum()的结果在原始列表上使用此功能:

s = commonSum(listA,listB)  # 446
r1 = findSum(s,listA)       # [-9, 455]
r2 = findSum(s,listB)       # [23, 55, 64, 304]

对于这些简单的情况,该findSum()函数在毫秒内响应,但在某些最坏的情况下,响应时间也可能更长。

您可以将这两个功能合而为一:

def commonSumItems(A,B):
    s = commonSum(A,B)
    if s is None: return None
    sA = findSum(s,A)
    sB = findSum(s,B)
    return (s,sA,sB)

commonSumItems(listA,listB) 

# (446, [-9, 455], [23, 55, 64, 304])

这两种算法仍有很大的空间可以做进一步的优化,但我认为它们对于大多数用例来说足够快。

潜在的优化:

  • 在commonSum()中,当处理列表中一半的项目时,停止检查递减的总和(diffA,diffB)。
  • 在commonSum()中,为每个列表中的元素计算与最近邻居的最小差异,并以此为键进行排序
  • 在commonSumItems()中,获取到目前为止已处理的listA和listB的子集,并在调用findSum()时使用它们而不是整个列表

编辑

我试图解决在另一篇文章中找到的怪物列表,但findSum()的执行速度不够快。因此,我创建了一个函数,通过消除无法产生目标和的模数的元素组合来简化问题。这实际上帮助了我,使我得以解决该怪物清单:

这里是函数(简化后调用findSum):

from collections import defaultdict
def findSum2(S,A,modSize=None):
    eligibleA = A.copy()
    for modSize in range(2,len(A)*2): # modulo sum sizes
        sumD = defaultdict(set)       # eligible number per modulo of sum
        for a in eligibleA:
            newD = [(a%modSize,[a])]  # new modulo sum candidates
            for d,ad in sumD.items():
                m = (a+d) % modSize
                newD.append((m,ad | set([a])))
            for m,ad in newD:         # merge new candidates to target modulos
                sumD[m].update(ad)
        targetModList = sumD[S % modSize] # keep only eligibles
        eligibleA = [ a for a in eligibleA if a in targetModList ]
    return findSum(S,eligibleA) # call original algorithm with "cleaned" list

这是我要使用findSum解决的“怪物”列表:

targetB = 262988806539946324131984661067039976436265064677212251086885351040000
B = [116883914017753921836437627140906656193895584300983222705282378240000,
 65747201634986581032996165266759994109066266169303062771721337760000,
 42078209046391411861117545770726396229802410348353960173901656166400,
 29220978504438480459109406785226664048473896075245805676320594560000,
 21468474003260924418937523352411426647858372626711204170357987840000,
 16436800408746645258249041316689998527266566542325765692930334440000,
 12987101557528213537381958571211850688210620477887024745031375360000,
 10519552261597852965279386442681599057450602587088490043475414041600,
 8693844844295746252297013588993057072273225278585528961549928960000,
 7305244626109620114777351696306666012118474018811451419080148640000,
 6224587137040149683597270084426981690799173128454727836375984640000,
 5367118500815231104734380838102856661964593156677801042589496960000,
 4675356560710156873457505085636266247755823372039328908211295129600,
 4109200102186661314562260329172499631816641635581441423232583610000,
 3639983481521748430892521260443459881470796742937193786669693440000,
 3246775389382053384345489642802962672052655119471756186257843840000,
 2914003396564502206448583502127866774917064428556368433095682560000,
 2629888065399463241319846610670399764362650646772122510868853510400,
 2385386000362324935437502594712380738650930291856800463373109760000,
 2173461211073936563074253397248264268068306319646382240387482240000,
 1988573206351200938616141104476672789688204647842814753019927040000,
 1826311156527405028694337924076666503029618504702862854770037160000,
 1683128361855656474444701830829055849192096413934158406956066246656,
 1556146784260037420899317521106745422699793282113681959093996160000,
 1443011284169801504153550952356872298690068941987447193892375040000,
 1341779625203807776183595209525714165491148289169450260647374240000,
 1250838556670374906691960338012080744048823137584838292922165760000,
 1168839140177539218364376271409066561938955843009832227052823782400,
 1094646437211014876720019400903392201607763016346356924399106560000,
 1027300025546665328640565082293124907954160408895360355808145902500,
 965982760477305139144112620999228563585913919842836551283325440000,
 909995870380437107723130315110864970367699185734298446667423360000,
 858738960130436976757500934096457065914334905068448166814319513600,
 811693847345513346086372410700740668013163779867939046564460960000,
 768411414287644482489363509326632509674989232073666182868912640000,
 728500849141125551612145875531966693729266107139092108273920640000,
 691620793004461075955252231602997965644352569828303092930664960000,
 657472016349865810329961652667599941090662661693030627717213377600,
 625791330255672395317036671188673352614551016483550865168079360000,
 596346500090581233859375648678095184662732572964200115843277440000,
 568931977371436071675467087219123799753953628290345594563299840000,
 543365302768484140768563349312066067017076579911595560096870560000,
 519484062301128541495278342848474027528424819115480989801255014400,
 497143301587800234654035276119168197422051161960703688254981760000,
 476213321032044045508347054897310957784092466595223632570186240000,
 456577789131851257173584481019166625757404626175715713692509290000,
 438132122515529069774235170457376054037925971973698044293020160000,
 420782090463914118611175457707263962298024103483539601739016561664,
 404442609057972047876946806715939986830088526993021531852188160000,
 389036696065009355224829380276686355674948320528420489773499040000,
 374494562534633427030238036407319297168052779889230688624970240000,
 360752821042450376038387738089218074672517235496861798473093760000,
 347753793771829850091880543559722282890929011143421158461997158400,
 335444906300951944045898802381428541372787072292362565161843560000,
 323778155173833578494287055791985197213007158728485381455075840000,
 312709639167593726672990084503020186012205784396209573230541440000,
 302199145693704480473409550206308504954053507241841138853071360000,
 292209785044384804591094067852266640484738960752458056763205945600,
 282707666261699891568916593460940582033071824431295083135592960000,
 273661609302753719180004850225848050401940754086589231099776640000,
 265042888929147215048611399412486748738992254650755607041456640000,
 256825006386666332160141270573281226988540102223840088952036475625,
 248983485481605987343890803377079267631966925138189113455039385600,
 241495690119326284786028155249807140896478479960709137820831360000,
 234340660761814501342824380545368657996226388663143017230461440000,
 227498967595109276930782578777716242591924796433574611666855840000,
 220952578483466770957349011608519198854244960871423861446658560000,
 214684740032609244189375233524114266478583726267112041703579878400,
 208679870295533683104133831435857945991878646837700655494453760000,
 202923461836378336521593102675185167003290944966984761641115240000,
 197401994025105141026072179446079922264038329650750423033879040000,
 192102853571911120622340877331658127418747308018416545717228160000,
 187014262428406274938300203425450649910232934881573156328451805184,
 182125212285281387903036468882991673432316526784773027068480160000,
 177425404985627474536673746714144021883127046501745489011223040000,
 172905198251115268988813057900749491411088142457075773232666240000,
 168555556186474170249629649778586749838977769381324948621621760000,
 164368004087466452582490413166899985272665665423257656929303344400]

它可以在不到一秒钟的时间内找到解决方案:

 findSum2(targetB,B) # 0.3 second

 [202923461836378336521593102675185167003290944966984761641115240000,
  292209785044384804591094067852266640484738960752458056763205945600,
  335444906300951944045898802381428541372787072292362565161843560000,
  519484062301128541495278342848474027528424819115480989801255014400,
  657472016349865810329961652667599941090662661693030627717213377600,
  811693847345513346086372410700740668013163779867939046564460960000,
  858738960130436976757500934096457065914334905068448166814319513600,
  1168839140177539218364376271409066561938955843009832227052823782400,
  1826311156527405028694337924076666503029618504702862854770037160000,
  2385386000362324935437502594712380738650930291856800463373109760000, 
  29220978504438480459109406785226664048473896075245805676320594560000,
  42078209046391411861117545770726396229802410348353960173901656166400,
  65747201634986581032996165266759994109066266169303062771721337760000,
  116883914017753921836437627140906656193895584300983222705282378240000]     

EDIT2

对于某些数据,尝试按其拥有的目标总和的公共gcd(最大公分母)分组的列表元素的子集会有所帮助。其背后的思想是,列表中的某些值K的倍数将始终合并形成K的倍数。如果目标是K的倍数,则列表中的所有其他元素(不是K的倍数) )只会干扰。这样就可以实现这样一种策略,我们首先考虑使用较少的元素并仅使用两个变体,然后再进行大量组合的更简单解决方案。这就是为什么该函数首先尝试最大除数的原因。

我能够使用这种方法(结合以前的逻辑)找到您上一个列表的解决方案:注意:这代替了findSum2()

from math import gcd
from itertools import combinations
def gcdSubSum(S,A):
    divs   = set( gcd(a,S) for a in A if isinstance(a,int)).difference([1])
    prevDivs = set()
    while prevDivs != divs:
        prevDivs = divs
        for a,b in combinations(divs,2):
            g = gcd(a,b)
            if g == 1 : continue
            divs.discard(a)
            divs.discard(b)
            divs.add(g)
    divs   = sorted(divs,reverse=True)
    combos = sorted(combinations(divs,2),key=lambda a:-min(a))
    for divCombo in combos:
        subA = [ a for a in A if any(a%d==0 for d in divCombo) ]
        result  = findSum(S,subA)
        if result is not None:return result
    return findSum(S,A)

C = [11000,11000,11000,1500,58272,76000,260669,2881,-3472,1591460,633959, 
 1298377,897946,1912536,35166,46888,46888,65190,16000,80000,-9175476,
 51950,-51950,428546,1693196,-18378,-9800,-18820,-3087,3087,30000,
 18378,18820,9800]
targetC = 290670

gcdSubSum(targetC,C) # 0.9 second

# [-51950, -9800, 11000, 11000, 11000, 16000, 18378, 18820, 51950, 58272, 76000, 80000]

这种新方法甚至能够解决我创建的一些故意困难的测试用例。

# Needs to combine 530 elements out of 1280 to reach the target:

targetD = 123456
D = [2**i for i in range(10)]*128 
gcdSubSum(targetD,D)                # 0.1 second


# must use several negatives to fill gap between a single 
# larger positive number an insufficient other positive 
# (52 out of 123).

targetE = 123456
E = [2**i for i in range(6,16)] + [1-2**i for i in range(1,15)]*8 + [2**18]
gcdSubSum(targetE,E)       # 10 seconds !!!

# [ -16383, -16383, -16383, -16383, -16383, -16383, -16383, -16383,
#   -4095, -4095, -4095, -4095, -4095, -4095, -4095, -4095, -1023,
#   -1023, -1023, -1023, -1023, -1023, -1023, -1023, -255, -255,
#   -255, -255, -255, -255, -255, -255, -63, -63, -63, -63, -63,
#   -63, -63, -15, -3, -3, -3, -3, -3, -3, -3, -3,
#   1024, 2048, 32768, 262144]

EDIT3 迭代求和...

查找所有子和集不能很好地与这些算法一起使用,因为它们经过优化以查找一个解。为了生成所有子集,我不得不以不同的方式处理该问题。这个新函数是一个生成器,有时它比gcdSubSum()更快地提供第一个解决方案。它确实有限制,尽管它仅适用于整数值。但是,您可以通过将所有元素和总和乘以一个大数使它们成为整数,然后将结果相除来解决此问题:

def iSubSum(S,A,maxSize=None,sumA=None, oddCount=None):
    if maxSize is None: maxSize = len(A)
    if maxSize==0 or not A : return
    seen = set()        
    def newResult(result): # ensure distinct results
        result = sorted(result)
        return None if tuple(result) in seen else result            
    def addResult(result):
        if result: seen.add(tuple(result))

    if sumA     is None: sumA = sum(A)
    if oddCount is None: oddCount = sum(a&1 for a in A)
    if len(A) > maxSize:
        sA  = sorted(A)
        maxSumA  = sum([a for a in sA[-maxSize:] if a>0] or [max(A)])
        minSumA  = sum([a for a in sA[:maxSize]  if a<0] or [min(A)])
    else:
        maxSumA  = sum([a for a in A if a>0] or [max(A)])
        minSumA  = sum([a for a in A if a<0] or [min(A)])

    if S < minSumA or S > maxSumA: return      # sum unreachable
    if sumA == S and len(A)<=maxSize :
        yield A                                # all values form the sum
        addResult(sorted(A))
    elif  S in A  :
        yield [S]                              # one of the items is the sum
        addResult([S])
    if len(A) == 1 or maxSize==1: return       # if only one, we're done
    if S&1 and not oddCount: return            # need at least one odd for odd sum

    # remove elements that are beyond target range
    mA = [a for a in A if (minSumA<=a or minSumA+a<=S) and (a>=0 or maxSumA+a>=S)]
    if len(mA) < len(A):
        for result in iSubSum(S,mA,maxSize): yield result
        return        

    # even target: process even elements (divide problem by 2)
    if S&1 == 0:
        evens = [a//2 for a in A if a&1 == 0]
        for result in iSubSum(S//2,evens,maxSize):
            result = newResult([r*2 for r in result])
            if result: yield result
            addResult(result)
        if oddCount < 2 : return # need 2+ odd elements for even sum               

    #process odd elements (recursing until only evens remains)
    subA     = A.copy()
    for index,item in enumerate(reversed(A),1-len(A)):
        if not item&1: continue
        del subA[-index]
        oddCount -= 1
        sumA     -= item
        for result in iSubSum(S-item,subA,maxSize-1,sumA,oddCount):
            result = newResult(result + [item])
            if result: yield result
            addResult(result)

要获取第一个解决方案,请在next()函数中调用iSubSum:

next(iSubSum(targetA,A),None) # 1.869 sec. (SLOWER than gcdSubSum 0.042)
next(iSubSum(targetB,B),None) # too long   (SLOWER than gcdSubSum 0.252)
next(iSubSum(targetC,C),None) # 0.003 sec. (FASTER than gcdSumSum 0.936)
next(iSubSum(targetD,D),None) # 0.006 sec. (FASTER than gcdSubSum 0.113)
next(iSubSum(targetE,E),None) # 0.003 sec. (FASTER than gcdSubSum 10.34)

不幸的是,iSubSum()在怪物列表(targetB / B)上的表现不佳,因此它仍然可以使用更多的工作。

要获取所有解决方案,可以使用for循环或构建列表:

for solution in iSubSum(446,listB): 
    print(solution) 

[64, 156, 226]
[23, 55, 64, 304]

solutions = list(iSubSum(targetC,C)) # 1.8 seconds
print(solutions) 

[-9800, 11000, 11000, 11000, 16000, 18378, 18820, 58272, 76000, 80000]
[-51950, -9800, 11000, 11000, 11000, 16000, 18378, 18820, 51950, 58272, 76000, 80000]
[-51950, -9800, -3472, 1500, 11000, 11000, 16000, 18378, 18820, 35166, 46888, 51950, 65190, 80000]
[-51950, -18820, -9800, -3472, 1500, 9800, 11000, 11000, 11000, 18378, 18820, 30000, 35166, 46888, 46888, 58272, 76000]
[-51950, -18820, -3472, 1500, 11000, 11000, 11000, 18378, 18820, 30000, 35166, 46888, 46888, 58272, 76000]
[-51950, -9800, -3472, 1500, 9800, 11000, 11000, 11000, 18378, 30000, 35166, 46888, 46888, 58272, 76000]
[-51950, -3472, 1500, 11000, 11000, 11000, 18378, 30000, 35166, 46888, 46888, 58272, 76000]
[-9800, -3472, 1500, 11000, 11000, 16000, 18378, 18820, 35166, 46888, 65190, 80000]
[-18820, -18378, -9800, -3472, 11000, 11000, 16000, 30000, 51950, 65190, 76000, 80000]
[-9800, -3087, 3087, 11000, 11000, 11000, 16000, 18378, 18820, 58272, 76000, 80000]
[-51950, -9800, -3087, 3087, 11000, 11000, 11000, 16000, 18378, 18820, 51950, 58272, 76000, 80000]
[-51950, -9800, -3472, -3087, 1500, 3087, 11000, 11000, 16000, 18378, 18820, 35166, 46888, 51950, 65190, 80000]
[-51950, -18820, -9800, -3472, -3087, 1500, 3087, 9800, 11000, 11000, 11000, 18378, 18820, 30000, 35166, 46888, 46888, 58272, 76000]
[-51950, -18820, -3472, -3087, 1500, 3087, 11000, 11000, 11000, 18378, 18820, 30000, 35166, 46888, 46888, 58272, 76000]
[-51950, -9800, -3472, -3087, 1500, 3087, 9800, 11000, 11000, 11000, 18378, 30000, 35166, 46888, 46888, 58272, 76000]
[-51950, -3472, -3087, 1500, 3087, 11000, 11000, 11000, 18378, 30000, 35166, 46888, 46888, 58272, 76000]
[-9800, -3472, -3087, 1500, 3087, 11000, 11000, 16000, 18378, 18820, 35166, 46888, 65190, 80000]
[-18820, -18378, -9800, -3472, -3087, 3087, 11000, 11000, 16000, 30000, 51950, 65190, 76000, 80000]
[-51950, -18820, -18378, -9800, -3087, 1500, 2881, 3087, 11000, 18378, 30000, 65190, 260669]
[-51950, -18820, -9800, -3087, 1500, 2881, 3087, 11000, 30000, 65190, 260669]
[-9800, -3472, 2881, 3087, 11000, 18378, 18820, 46888, 46888, 76000, 80000]
[-51950, -18378, -9800, 1500, 2881, 3087, 11000, 11000, 18378, 18820, 30000, 35166, 46888, 46888, 65190, 80000]
[-18820, -18378, -3472, 2881, 3087, 9800, 11000, 11000, 18378, 35166, 46888, 51950, 65190, 76000]
[-51950, -9800, -3472, 2881, 3087, 11000, 18378, 18820, 46888, 46888, 51950, 76000, 80000]
[-51950, -18820, -18378, -9800, 1500, 2881, 3087, 9800, 16000, 30000, 46888, 58272, 65190, 76000, 80000]
[-51950, -18820, -18378, 1500, 2881, 3087, 16000, 30000, 46888, 58272, 65190, 76000, 80000]
[-51950, -9800, 1500, 2881, 3087, 11000, 11000, 18820, 30000, 35166, 46888, 46888, 65190, 80000]
[-18820, -3472, 2881, 3087, 9800, 11000, 11000, 35166, 46888, 51950, 65190, 76000]
[-18378, -9800, -3472, -3087, 1500, 2881, 9800, 18378, 18820, 30000, 46888, 51950, 65190, 80000]
[-18378, -3472, -3087, 1500, 2881, 18378, 18820, 30000, 46888, 51950, 65190, 80000]
[-18378, -9800, -3472, -3087, 2881, 9800, 11000, 46888, 46888, 51950, 76000, 80000]
[-18378, -3472, -3087, 2881, 11000, 46888, 46888, 51950, 76000, 80000]
[-18820, -18378, -9800, -3472, -3087, 2881, 9800, 11000, 18820, 46888, 46888, 51950, 76000, 80000]
[-18820, -18378, -3472, -3087, 2881, 11000, 18820, 46888, 46888, 51950, 76000, 80000]
[-9800, -3472, -3087, 1500, 2881, 9800, 18820, 30000, 46888, 51950, 65190, 80000]
[-3472, -3087, 1500, 2881, 18820, 30000, 46888, 51950, 65190, 80000]
[-51950, -18820, -18378, -9800, 1500, 2881, 11000, 18378, 30000, 65190, 260669]
[-51950, -18820, -9800, 1500, 2881, 11000, 30000, 65190, 260669]

EDIT4 重复项和索引

为了获取值的索引,我建议从iSubSum产生的不同解决方案中倒退。这是一个可以为给定解决方案找到各种索引组合的函数:

from itertools import product
from collections import defaultdict
def comboIndexes(A,R): # return indexes conbinations in A matching values in R
    idx = defaultdict(list)
    for i,a in enumerate(A): idx[a].append(i)
    return set(tuple(sorted(combo)) for combo in product(*[idx[r] for r in R]) if len(combo)==len(set(combo)))

您可以在iSubSum的输出上使用它,从而不会给iSubSum函数增加处理复杂性:

例如:

listC = [10,10,40,40,40,50]
for solution in iSubSum(100,listC):
    for indexes in comboIndexes(listC,solution):
        print(indexes,solution)

(0, 4, 5) [10, 40, 50]
(0, 3, 5) [10, 40, 50]
(1, 4, 5) [10, 40, 50]
(1, 3, 5) [10, 40, 50]
(0, 2, 5) [10, 40, 50]
(1, 2, 5) [10, 40, 50]
(0, 1, 2, 4) [10, 10, 40, 40]
(0, 1, 3, 4) [10, 10, 40, 40]
(0, 1, 2, 3) [10, 10, 40, 40]

EDIT5 ,使用iSubSum来计算两个列表之间的公共和

在大多数情况下,可以使用iSubSum作为一种机制来获取两个列表之间的公共和(原始问题)。该方法是从两个列表中制作一个列表,但与两个列表之一相反。如果存在共同的和,则合并后的列表应该能够形成零和。

这是一个例子:

def iCommonSum(A,B):
    if any( -a in B for a in A):
        return None # value conflict (algorithm TBD)
    for solution in iSubSum(0,[-a for a in A]+B):
        sA = [-a for a in solution if -a in A]
        sB = [b for b in solution if b in B]
        yield sA,sB

listA = [455, 698, 756, 3,56, -9]
listB = [526,55,943,156,531,304,618,911,598,498,268,926,899,898,131,966,303,936,509,67,976,639,74,935,23,226,422,280,64,975,583,596,583]

for sA,sB in iCommonSum(listA,listB):
    print(sum(sA),sA,sB)

1454 [756, 698] [226, 280, 422, 526]
1454 [756, 698] [74, 156, 304, 422, 498]
1510 [756, 698, 56] [64, 74, 156, 268, 422, 526]
1510 [756, 698, 56] [64, 422, 498, 526]
1454 [756, 698] [156, 280, 422, 596]
754 [698, 56] [64, 268, 422]
1510 [756, 698, 56] [64, 74, 226, 268, 280, 598]
...

唯一的限制是,一个列表中的元素一定不能与另一列表中的元素的取反值相同。请注意,仅当列表允许使用负数时,才会存在此问题。

创建返回索引而不是值的iSubSum版本可以消除这种情况的歧义(在新的答案中继续进行)。

答案 3 :(得分:0)

不确定您是否仍在寻找答案,但是您实际上可以扩展针对OP中提到的第2和第3个子案例所做的硬币找零方法。这里的想法是利用在动态编程方法中创建的备忘录表。请注意,您需要在两个数组中都使用正数(可以是浮点数)才能在此处获得最佳解决方案。

考虑两个数组: [4,3,5,1][2,6,4,3]

让我们使用硬币找零方法为第一个数组创建备忘录表,其中所需的最大和为数组中所有元素的总和,在这种情况下为13。备忘录表如下:

    0   1   2   3   4   5   6   7   8   9   10  11  12  13
4   T   F   F   F   T   F   F   F   F   F   F   F   F   F
3   T   F   F   T   T   F   F   T   F   F   F   F   F   F
5   T   F   F   T   T   T   F   T   T   T   F   F   T   F
1   T   T   F   T   T   T   T   T   T   T   T   F   T   T

对于第二个数组,总和为15,该表如下所示:

    0   1   2   3   4   5   6   7   8   9   10  11  12  13  14  15
2   T   F   T   F   F   F   F   F   F   F   F   F   F   F   F   F
6   T   F   T   F   F   F   T   F   T   F   F   F   F   F   F   F
4   T   F   T   F   T   F   T   F   T   F   T   F   T   F   F   F
3   T   F   T   T   T   T   T   T   T   T   T   T   T   T   F   T

如果您看到两个表的最后一行,则可以轻松得出结论:无论哪一列的值都为T,该特定列号可以表示为给定数组中某些元素的总和。您如何找到这些元素?您可以在现有的备注表中进行简单的回溯,以获取获取该特定列总和的所有可能方法。对于单元格值为T的任何列,从最后一行开始。然后,您可以回溯使用该特定列的所有T值,并相应地调整总和。

现在移至主要部分,了解如何知道哪个子序列给您相同的总和。 OP的案例4。好了,一旦您使用最后一行为所有可能的总和形成了上述子序列,就可以逐列比较两个记忆表的最后一行,以找出两个数组中实际形成的总和并返回关联的子序列根据这些金额存储。例如,在给定两个数组的情况下,由以上两个数组元素形成的公共和为[3,4,5,6,7,8,9,10,12,13],并且使用上述方法,您可以将这些和映射到给出这些和的数组列表中,从而返回这些数组结果。

时间复杂度为O(n1*s1 + n2*s2) 其中nisi是数组ai中元素的数量和总和,因为我认为您也可以将这种方法扩展到给定数组的k

如果有人在这里发现任何缺陷,请告诉我。

答案 4 :(得分:0)

根据我之前的回答完成的工作,此最终解决方案使用子集和函数(iSubSum)的变体,该变体返回索引而不是值。该方法通过在反转两个列表之一的元素之后合并两个列表来工作。然后,在组合列表中寻找零和将产生一组元素的组合,形成一个公共和。然后,可以根据列表或来源将组成该零和的元素分开。

这是iSubSum()的修改版本,它返回索引(请注意,它仅处理整数,但必须且必须支持负数):

def iSubSumIndexes(S,A,sumA=None, oddCount=None):
    if not A : return
    if not isinstance(A[0],tuple):
        A = [(a,i) for i,a in enumerate(A)]
    seen = set()        
    def newResult(result): # ensure distinct results
        result = sorted(result)
        return None if tuple(result) in seen else result            
    def addResult(result):
        if result: seen.add(tuple(result))

    if sumA     is None: sumA = sum(a for a,i in A)
    if oddCount is None: oddCount = sum(a&1 for a,i in A)
    maxSumA  = sum([a for a,i in A if a>0] or [max(A)[0]])
    minSumA  = sum([a for a,i in A if a<0] or [min(A)[0]])

    if S < minSumA or S > maxSumA: return      # sum unreachable
    if sumA == S:
        result = newResult([i for a,i in A])
        yield result                           # all values form the sum
        addResult(result)
    else:
        for a,i in A:
            if a != S: continue
            yield [i]                          # one of the items is the sum
            addResult([i])
    if len(A) == 1: return                     # if only one, we're done
    if S&1 and not oddCount: return            # need at least one odd for odd sum

    # remove elements that are beyond target range
    mA = [(a,i) for a,i in A if (minSumA<=a or minSumA+a<=S) and (a>=0 or maxSumA+a>=S)]
    if len(mA) < len(A):
        for result in iSubSumIndexes(S,mA): yield result
        return        

    # even target: process even elements (divide problem by 2)
    if S&1 == 0:
        evens = [(a//2,i) for a,i in A if not a&1]
        for result in iSubSumIndexes(S//2,evens):
            result = newResult(result)
            if result: yield result
            addResult(result)
        if oddCount < 2 : return # need 2+ odd elements for even sum               

    #process odd elements (recursing until only evens remains)
    subA     = A.copy()
    for index,(odd,oddIndex) in enumerate(reversed(A),1-len(A)):
        if not odd&1: continue
        del subA[-index]
        oddCount -= 1
        sumA     -= odd
        for result in iSubSumIndexes(S-odd,subA,sumA,oddCount):
            result = newResult(result + [oddIndex])
            if result: yield result
            addResult(result)

然后可以将该函数用在另一个函数中,该函数将列表与所需的反转组合在一起,并将索引返回到其原始列表。通过使用索引,我们避免了元素最初所属列表的歧义。

def iCommonSumIndexes(A,B):
    for solution in iSubSumIndexes(0,[-a for a in A]+B):
        iA = [i for i in solution if i < len(A)]
        iB = [i-len(A) for i in solution if i >= len(A)]
        yield iA,iB

要使用此功能,您将需要处理输出中的索引,而不是实际值,但这是在列表理解中轻松完成的简单间接操作。附带的好处是您可以使用索引来映射包含值的更复杂的对象(例如会计交易):

listA = [455, 698, 756, 3,56, -9]
listB = [526,55,943,156,531,304,618,911,598,498,268,926,899,898,131,966,303,936,509,67,976,639,74,935,23,226,422,280,64,975,583,596,583]

for iA,iB in iCommonSumIndexes(listA,listB):
    sA = [listA[i] for i in iA]
    sB = [listB[i] for i in iB]
    print(sum(sA),sA,sB)

1454 [698, 756] [526, 226, 422, 280]
1454 [698, 756] [156, 304, 498, 74, 422]
1510 [698, 756, 56] [526, 156, 268, 74, 422, 64]
1510 [698, 756, 56] [526, 498, 422, 64]
1454 [698, 756] [156, 422, 280, 596]
754 [698, 56] [268, 422, 64]
1510 [698, 756, 56] [598, 268, 74, 226, 280, 64]
1510 [698, 756, 56] [156, 618, 598, 74, 64]
756 [756] [618, 74, 64]
...

该函数返回产生公共和的索引的所有可能组合。其中包括一个列表中元素的组合,这些元素的总和为零,因此与另一个列表中的空子集匹配。

仅在需要时可以过滤输出:

  • 唯一值组合
  • 明显的左右图案
  • 非空子集。

性能由iSubSumIndexes算法驱动,该算法通常很快,但确实有其局限性。

如果您找到了一种更好的子集和算法,可以将其用于返回索引,则同样的方法也将起作用(通过使用更好的功能替换iCommonSumIndexes中的iSubSumIndexes)。换句话说,可以将公共和问题简化为更一般的子和问题。

EDIT iSubSumIndexes的改进版本

这是该功能的改进版本,将其奇/偶概念扩展到主要因子。在此扩展视图中,“偶数”的概念对应于作为基本质数倍数的数字。 “奇数”的概念是该基本质数的任何非整数。与偶数/奇数计算一样,添加“偶数”值将始终产生“偶数”结果。另外,必须将至少两个“奇数”值加在一起才能产生“偶数”结果。对于这些基于质数的“奇数”,可能需要两个以上,但这与算法无关。

仅使用一些早期的素数,即使在我之前测试的怪物列表上,该函数也能够系统地胜过gcdSubSum。这是迄今为止最好的。

def iSubSumIndexes(S,A):

    if not A : return
    if not isinstance(A[0],tuple):
        A = [(a,i) for i,a in enumerate(A)]
    seen = set()        
    def newResult(result): # ensure distinct results
        result = sorted(result)
        return None if tuple(result) in seen else result            
    def addResult(result):
        if result: seen.add(tuple(result))

    sumA     = sum(a for a,i in A)
    maxSumA  = sum([a for a,i in A if a>0] or [max(A)[0]])
    minSumA  = sum([a for a,i in A if a<0] or [min(A)[0]])

    if S < minSumA or S > maxSumA: return      # sum unreachable
    if sumA == S:
        result = sorted(i for a,i in A)
        yield result                           # all values form the sum
        addResult(result)
    else:
        for a,i in A:
            if a == S: 
               yield [i]                       # one of the items is the sum
               addResult([i])
    if len(A) == 1: return                     # if only one, we're done

    # remove elements that are beyond target range
    mA = [(a,i) for a,i in A if (minSumA<=a or minSumA+a<=S) and (a>=0 or maxSumA+a>=S)]
    if len(mA) < len(A):
        for result in iSubSumIndexes(S,mA): yield result
        return

    # Apply even/odd concept to prime factors       
    for f in [19,17,13,11,7,5,3,2]:
        oddCount = sum(a%f != 0 for a,i in A)
        if S%f and not oddCount: return # need at least one odd for odd sum
        if f > 2 and oddCount > 1:
           if oddCount > len(A)//2: continue

        # even target: process even elements (divide problem by f)
        if S%f == 0:
            evens = [(a//f,i) for a,i in A if  a%f == 0]
            for result in iSubSumIndexes(S//f,evens):
                result = newResult(result)
                if result: yield result
                addResult(result)
            if oddCount < 2 : return # need 2+ odd elements for even sum               

        #process odd elements (recursing until only evens remains)
        subA     = A.copy()
        for index,(item,ii) in enumerate(reversed(A),1-len(A)):
            if item%f == 0: continue
            del subA[-index]
            for result in iSubSumIndexes(S-item,subA):
                result = newResult(result + [ii])
                if result: yield result
                addResult(result)
        break

可能有必要根据数据的性质调整素数列表。但是,仅将更多质数添加到列表中可能会带来收益减少,甚至在某些情况下甚至会降低性能。