在字符串python中找到最长的重复片段

时间:2018-01-22 14:56:56

标签: python string function

我想编写一个“最长”的函数,我的输入文档测试看起来像这样(python)

"""
>>>longest('1211')
1
>>>longest('1212')
2
>>>longest('212111212112112121222222212212112121')
2
>>>lvs('1')
0
>>>lvs('121')
0
>>>lvs('12112')
0

"""

我想要实现的是,例如在第一种情况下,1在后面用“11”重复,因此重复的部分是1,这个重复的部分是1个字符长,这个函数应该是这个长度返回。

所以在第二个的情况下你得到“1212”所以重复的部分是“12”,这是2个字符长。

这里最棘手的事情是最长的是“2222222”,但这并不重要,因为它不在前面或后面。上一次doc测试的解决方案是重复21次,长度为2个字符。

我到目前为止创建的代码如下 导入重新

def repetitions(s):
    r = re.compile(r"(.+?)\1+")
    for match in r.finditer(s):
        yield (match.group(1), len(match.group(0)) / len(match.group(1)))


def longest(s):
    """
    >>> longest('1211')
    1
    """
    nummer_hoeveel_keer = dict(repetitions(s)) #gives a dictionary with as key the number (for doctest 1 this be 1) and as value the length of the key 

    if nummer_hoeveel_keer == {}: #if there are no repetitive nothing should be returnd
        return 0

    sleutels = nummer_hoeveel_keer.keys() #here i collect the keys to see which has has the longest length

    lengtes = {}

    for sleutel in sleutels:
        lengte = len(sleutel)
        lengtes[lengte] = sleutel

    while lengtes != {}: #as long there isn't a match and the list isn't empty i keep looking for the longest repetitive which is or in the beginning or in the back
        maximum_lengte = max(lengtes.keys())

        lengte_sleutel = {v: k for k, v in lengtes.items()}
        x= int(nummer_hoeveel_keer[(lengtes[maximum_lengte])])

        achter  = s[len(s) - maximum_lengte*x:]
        voor = s[:maximum_lengte*x]

        combinatie = lengtes[maximum_lengte]*x

        if achter == combinatie or voor == combinatie:
            return maximum_lengte

        del lengtes[str(maximum_lengte)]
    return 0

将以下doc测试放入此代码

"""
longest('12112')
0
""

有一个关键错误,我把“del lengtes [str(maximum_lengte)]”

在@theausome的建议之后我使用他的代码作为基础来进一步研究(参见答案):这使得我的代码现在看起来像这样:

def longest(s):
    if len(s) == 1:
        return 0
    longest_patt = []
    k = s[-1]
    longest_patt.append(k)
    for c in s[-2::-1]:
        if c != k:
            longest_patt.append(c)
        else:
            break
    rev_l = list(reversed(longest_patt))
    character = ''.join(rev_l)
    length = len(rev_l)
    s = s.replace(' ','')[:-length]
    if s[-length:] == character:
        return len(longest_patt)
    else:
        return 0

l = longest(s)
print l

仍有一些文档测试令我不安,例如:

>>>longest('211211222212121111111')
3 #I get 1

>>>longest('2111222122222221211221222112211')
4 #I get 1
>>>longest('122211222221221112111')
4 #I get 1
>>>longest('121212222112222112')
6 #I get 1

任何人都有想法如何处理/解决这个问题,或许找到一个更优雅的方法解决问题?

1 个答案:

答案 0 :(得分:-1)

尝试以下代码。它非常适合您的输入文档测试。

def longest(s):
    if len(s) == 1:
        return 0
    longest_patt = []
    k = s[-1]
    longest_patt.append(k)
    for c in s[-2::-1]:
        if c != k:
            longest_patt.append(c)
        else:
            break
    rev_l = list(reversed(longest_patt))
    character = ''.join(rev_l)
    length = len(rev_l)
    s = s.replace(' ','')[:-length]
    if s[-length:] == character:
        return len(longest_patt)
    else:
        return 0

l = longest(s)
print l

输出:

longest('1211')
1
longest('1212')
2
longest('212111212112112121222222212212112121')
2
longest('1')
0
longest('121')
0
longest('12112')
0