在字符串python中找到最长的唯一子字符串

时间:2017-10-06 20:27:48

标签: python python-3.x

我正在尝试找到一个不包含重复字符的字符串的最长子字符串这个古老的问题(有许多版本)。我无法解释为什么我的尝试无法正常工作:

def findLongest(inputStr):
    resultSet = []
    substr = []

    for c in inputStr:
        print ("c: ", c)
        if substr == []:
            substr.append([c])
            continue

        print(substr)
        for str in substr:
            print ("c: ",c," - str: ",str,"\n")
            if c in str:
                resultSet.append(str)
                substr.remove(str)
            else:
                str.append(c)
        substr.append([c])



    print("Result set:")
    print(resultSet)
    return max(resultSet, key=len)

print (findLongest("pwwkewambb"))

当我的输出到达第二个'w'时,它不会迭代所有substr元素。我想我做了些傻事,但我看不出它是什么,所以一些指导意见将不胜感激!我觉得我会自己回答......

我输出的开头:

c:  p
c:  w
[['p']]
c:  w  - str:  ['p']

c:  w
[['p', 'w'], ['w']]
c:  w  - str:  ['p', 'w'] # I expect the next line to say c: w - str: ['w']

c:  k
[['w'], ['w']] # it is like the w was ignored as it is here
c:  k  - str:  ['w']

c:  k  - str:  ['w']
...

修改

我用

替换了for循环
for idx, str in enumerate(substr):
    print ("c: ",c," - str: ",str,"\n")
    if c in str:
        resultSet.append(str)
        substr[idx] = []
    else:
        str.append(c)

并产生正确的结果。唯一的事情是空元素数组与下一个字符一起设置。这似乎有点无意义;必须有更好的方法。

我的预期输出是 kewamb

e.g。

c:  p
c:  w
[['p']]
c:  w  - str:  ['p']

c:  w
[['p', 'w'], ['w']]
c:  w  - str:  ['p', 'w']

c:  w  - str:  ['w']

c:  k
[[], [], ['w']]
c:  k  - str:  []

c:  k  - str:  []

c:  k  - str:  ['w']

c:  e
[['k'], ['k'], ['w', 'k'], ['k']]
c:  e  - str:  ['k']

c:  e  - str:  ['k']

c:  e  - str:  ['w', 'k']

c:  e  - str:  ['k']
...

5 个答案:

答案 0 :(得分:2)

不确定你的尝试有什么问题,但它很复杂并且在:

    for str in substr:
        print ("c: ",c," - str: ",str,"\n")
        if c in str:
            resultSet.append(str)
            substr.remove(str)

你在迭代时从列表中删除元素:不要这样做,它会产生意想不到的结果。

无论如何,我的解决方案,不确定它是否直观,但它可能更简单&较短:

  • 使用增加索引
  • 对字符串进行切片
  • 对于每个切片,创建一个set并存储字母,直到您到达字符串的末尾或者set中已经有一个字母。您的索引是最大长度
  • 计算每次迭代的最大长度&存储相应的字符串

代码:

def findLongest(s):
    maxlen = 0
    longest = ""
    for i in range(0,len(s)):
        subs = s[i:]
        chars = set()
        for j,c in enumerate(subs):
            if c in chars:
                break
            else:
                chars.add(c)
        else:
            # add 1 when end of string is reached (no break)
            # handles the case where the longest string is at the end
            j+=1
        if j>maxlen:
            maxlen=j
            longest=s[i:i+j]
    return longest

print(findLongest("pwwkewambb"))

结果:

kewamb

答案 1 :(得分:2)

根据@seymour对错误回复的评论进行编辑:

def find_longest(s):
    _longest = set()
    def longest(x):
         if x in _longest:
             _longest.clear()
             return False
         _longest.add(x)
         return True
    return ''.join(max((list(g) for _, g in groupby(s, key=longest)), key=len))

并测试:

In [101]: assert find_longest('pwwkewambb') == 'kewamb'

In [102]: assert find_longest('abcabcbb') == 'abc'

In [103]: assert find_longest('abczxyabczxya') == 'abczxy'

旧答案:

from itertools import groupby

s = set() ## for mutable access

''.join(max((list(g) for _, g in groupby('pwwkewambb', key=lambda x: not ((s and x == s.pop()) or s.add(x)))), key=len))
'kewamb'

groupby返回根据key参数中提供的函数分组的迭代器,默认情况下为lambda x: x。我们通过使用可变结构(如果使用普通函数可以以更直观的方式完成)来利用某种状态而不是默认值。

lambda x: not ((s and x == s.pop()) or s.add(x))

这里发生的事情是因为我无法在lambda中重新分配全局赋值(我可以使用正确的函数再次执行此操作),我只是创建了一个可以添加/删除的全局可变结构。关键(没有双关语)是我只通过使用短路来根据需要添加/删除项目来保留我需要的元素。

maxlen相当自我解释,以获得groupby生成的最长列表

没有可变全球结构业务的另一个版本:

def longest(x):
     if hasattr(longest, 'last'):
         result = not (longest.last == x)
         longest.last = x
         return result
     longest.last = x
     return True


''.join(max((list(g) for _, g in groupby('pwwkewambb', key=longest)), key=len))
'kewamb'

答案 2 :(得分:1)

取决于您对重复字符的定义:如果您是连续的,则表示批准的解决方案是光滑的,但不会出现多次出现的字符(例如

pwwkewabmb->'kewabmb' )。

这是我想出的(Python 2):

def longest(word):
    begin = 0
    end = 0
    longest = (0,0)
    for i in xrange(len(word)):
        try:
            j = word.index(word[i],begin,end)
            # longest?
            if end-begin >= longest[1]-longest[0]:
                longest = (begin,end)
            begin = j+1
            if begin==end:
                end += 1
        except:
            end = i+1
    end=i+1
    if end-begin >= longest[1]-longest[0]:
        longest = (begin,end)
    return word[slice(*longest)]

因此

>>> print longest('pwwkewabmb')
kewabm
>>> print longest('pwwkewambb')
kewamb
>>> print longest('bbbb')
b

答案 3 :(得分:1)

我的 2 美分:

from collections import Counter

def longest_unique_substr(s: str) -> str:

    # get all substr-ings from s, starting with the longest one
    for substr_len in range(len(s), 0, -1):
        for substr_start_index in range(0, len(s) - substr_len + 1):
            substr = s[substr_start_index : substr_start_index + substr_len]

            # check if all substr characters are unique
            c = Counter(substr)
            if all(v == 1 for v in c.values()):
                return substr

    # ensure empty string input returns ""
    return ""

运行:

In : longest_unique_substr('pwwkewambb')
Out: 'kewamb'

答案 4 :(得分:0)

400