Question

给定2个字符串，在2个字符串之间按顺序查找最大字符重叠。

例如：

＆＃34; klccconcertcenter＆＃34;
＆＃34; klconventioncenter＆＃34;

顺序中的最大重叠是＆＃34; klconetcenter＆＃34;

这里举例说明：

的 KLC CC 在 C 电子 - [R的 tcenter
的 klcon v的ë名词吨离子的中心

我实际上有一个解决方案，但这是一个近似值。我天真的递归方法并不适用于长字符串。

好的，因为人们都在寻求我的解决方案：

def match_count(phrase1, phrase2):
    """This approximates match_count_recur() because the recursive function does not scale for long strings"""
    MAX_MATCH_COUNT_WIDTH = 15
    if len(phrase1) > MAX_MATCH_COUNT_WIDTH:
        return match_count(phrase1[:len(phrase1) / 2], phrase2) + match_count(phrase1[len(phrase1) / 2:], phrase2)
    return match_count_recur(phrase1, phrase2)


def match_count_recur(phrase1, phrase2):
    """
    Checks the number of characters that intersect (in order) between 2 phrases
    """
    if len(phrase2) < 1: return 0
    if len(phrase1) < 1: return 0
    if len(phrase1) == 1: return 1 if phrase1 in phrase2 else 0
    if len(phrase2) == 1: return 1 if phrase2 in phrase1 else 0
    if phrase1 in phrase2: return len(phrase1)
    if phrase2 in phrase1: return len(phrase2)

    char = phrase1[0]
    current_count = 1 if char in phrase2 else 0
    phrase2_idx = phrase2.index(char) + 1 if char in phrase2 else 0

    no_skip_count = current_count + match_count(phrase1[1:], phrase2[phrase2_idx:])
    skip_count = match_count(phrase1[1:], phrase2)

    return max(no_skip_count, skip_count)

def get_similarity_score(phrase1, phrase2):
    """
    Gets the similarity score of 2 phrases
    """
    phrase1 = phrase1.lower().replace(" ", "")
    phrase2 = phrase2.lower().replace(" ", "")
    shorter_phrase = phrase2
    longer_phrase = phrase1
    if len(phrase1) < len(phrase2):
        shorter_phrase = phrase1
        longer_phrase = phrase2
    return float(match_count(shorter_phrase, longer_phrase)) / float(len(shorter_phrase))

Answer 1

听起来你正试图解决最长的子序列问题（wiki page）。 This有一个可能很有趣的Python实现。

在difflib模块中，Python中还有一个实现类似内容的内置函数。特别是difflib.Differ类。

找到2个字符串之间的最大子序列

1 个答案: