从python中的两个方向上的两个单词计算单词重叠

时间:2017-05-05 15:18:06

标签: python

所以我有两个字。我想编写一个函数来查找从每个单词到另一个单词的最大重叠。例如:

words = ['AAB', 'BAA']
find_overlap('AAB', 'BAA')

应输出B和大小1,并且:

find_overlap('BAA', 'AAB')

应输出AA和尺寸2.有关如何操作的建议吗?

编辑:所以我从python尝试了difflib.SequenceMatcher,但我不理解输出。

s1 = "AAB"
s2 = "BAA"
s = difflib.SequenceMatcher(None, s1, s2)
pos_a, pos_b, size = s.find_longest_match(0, len(s1), 0, len(s2)) 
print(pos_a, pos_b, size)

1 个答案:

答案 0 :(得分:0)

对于较短的琴弦,一种天真的做法可能就足够了。例如,@ Moberg的想法可以像这样实现

def largest_overlap(s1,s2):
    n = min(len(s1),len(s2))
    for i in range(n,0,-1):
        if s2.startswith(s1[-i:]):
            return s1[-i:]
    return ''

一些测试用例:

print("BAA, AAB =>", largest_overlap("BAA", "AAB"))
print("AAB, BAA =>", largest_overlap("AAB", "BAA"))
print("AAA, BB =>", largest_overlap("AAA", "BB"))
print("AA, AABB =>", largest_overlap("AA", "AABB"))
print("hello world, world peace =>", largest_overlap("hello world", "world peace"))

输出:

BAA, AAB => AA
AAB, BAA => B
AAA, BB => 
AA, AABB => AA
hello world, world peace => world

对于较长的字符串,您可能需要更复杂的算法,类似于this