Question

我想要一个代码来返回两个字符串中所有相似序列的总和。我编写了以下代码，但它只返回其中一个

from difflib import SequenceMatcher
a='Apple Banana'
b='Banana Apple'
def similar(a,b):
    c = SequenceMatcher(None,a.lower(),b.lower()).get_matching_blocks()
    return sum( [c[i].size if c[i].size>1 else 0 for i in range(0,len(c)) ] )
print similar(a,b)

，输出将是

我希望它是：11

Answer 1

当我们编辑你的代码时，它会告诉我们6来自哪里：

from difflib import SequenceMatcher
a='Apple Banana'
b='Banana Apple'
def similar(a,b):
    c = SequenceMatcher(None,a.lower(),b.lower()).get_matching_blocks()
    for block in c:
        print "a[%d] and b[%d] match for %d elements" % block
print similar(a,b)

a [6]和b [0]匹配6个元素

a [12]和b [12]匹配0个元素

Answer 2

get_matching_blocks()返回最长的连续匹配子序列。这里最长的匹配子序列是＆＃39; banana＆＃39;在两个字符串中，长度为6.因此它返回6。

请改为尝试：

def similar(a,b):
    c = 'something' # Initialize this to anything to make the while loop condition pass for the first time
    sum = 0

    while(len(c) != 1):
        c = SequenceMatcher(lambda x: x == ' ',a.lower(),b.lower()).get_matching_blocks()

        sizes = [i.size for i in c]
        i = sizes.index(max(sizes))
        sum += max(sizes)

        a = a[0:c[i].a] + a[c[i].a + c[i].size:]
        b = b[0:c[i].b] + b[c[i].b + c[i].size:]

    return sum

这＆＃34;减去＆＃34;字符串的匹配部分，并再次匹配它们，直到len(c)为1，这将在没有剩余匹配时发生。

但是，此脚本不会忽略空格。为了做到这一点，我使用了来自this other SO answer的建议：只需在将字符串传递给函数之前预处理字符串，如下所示：

a = 'Apple Banana'.replace(' ', '')
b = 'Banana Apple'.replace(' ', '')

您也可以在函数中包含此部分。

Answer 3

我对你的代码进行了一些小改动，它就像一个魅力，感谢@Antimony

A       B   
26.00   11158115 
27.08   11881252 
90.25   69428973 
90.27   69749777 
95.90   71428751 
96.00   71670964 
107.65  100385980 
107.80  103280320

Python在字符串中找到类似的序列

3 个答案: