连续字母的最长序列

时间:2017-06-04 15:48:05

标签: python sequence

假设我有一串小写字母,例如

'ablccmdnneofffpg'

我的目标是找到此字符串中连续数字的最长序列,在这种情况下是:

'abcdefg'

直观的尝试是在每个字母周围找到循环并从该字母开始获得最长的序列。一种可能的解决方案是

longest_length = 0
start = None
current_start = 0
while current_start < len(word) - longest_length:
    current_length = 1
    last_in_sequence = ord(word[current_start])
    for i in range(current_start + 1, len(word)):
        if ord(word[i]) - last_in_sequence == 1:
            current_length += 1
            last_in_sequence = ord(word[i])
    if current_length > longest_length:
        longest_length = current_length
        start = current_start
    while (current_start < len(word) - 1 and
           ord(word[current_start + 1]) - ord(word[current_start]) == 1):
        current_start += 1
    current_start += 1

有没有其他方法可以用更少的线来解决问题,甚至使用一些pythonic方法?

4 个答案:

答案 0 :(得分:7)

您可以使用字典跟踪字符串中所示的连续字符的所有子序列,然后选择长度最大的字符串。

每个子序列由字母表中的下一个候选者键入,以便一旦在字符串中到达预期的候选者,它就会用于更新字典中相应子序列的值并添加为下一个字母表中新的字典值键入

def longest_sequence(s):
    d = {}
    for x in s:
       if x in d:
           d[chr(ord(x)+1)] = d[x] + x
       else:
           d[chr(ord(x)+1)] = x
    return max(d.values(), key=len)

print(longest_sequence('ablccmdnneofffpg'))
# abcdefg
print(longest_sequence('ba'))
# b
print(longest_sequence('sblccmtdnneofffpgtuyvgmmwwwtxjyuuz'))
# stuvwxyz

答案 1 :(得分:1)

一种在(某些)时间内交换内存的解决方案:

它跟踪所有看到的序列,然后在最后找到最长的序列(尽管可能有多个)。

from contextlib import suppress


class Sequence:
    def __init__(self, letters=''):
        self.letters = letters
        self.last = self._next_letter(letters[-1:])

    def append(self, letter):
        self.letters += letter
        self.last = self._next_letter(letter)

    def _next_letter(self, letter):
        with suppress(TypeError):
            return chr(ord(letter) + 1)
        return 'a'

    def __repr__(self):
        return 'Sequence({}, {})'.format(repr(self.letters),
                                         repr(self.last))


word = 'ablccmdnneofffpg'
sequences = []
for letter in word:
    for s in sequences:
        if s.last == letter:
            s.append(letter)
            break
    else:
        sequences.append(Sequence(letters=letter))

sequences = list(sorted(sequences, key=lambda s: len(s.letters), reverse=True))
print(sequences[0].letters)

答案 2 :(得分:0)

你基本上要求的是longest increasing subsequence,这是一个经过充分研究的问题。看看维基百科中的pseudo code

答案 3 :(得分:0)

MosesKoledoye's解决方案类似,但仅存储字符序数的长度,并且最后只构建解决方案字符串。因此,这应该更节省空间:

def longest_seq(s):
  d = {}
  for c in s:
    c, prev_c = ord(c), ord(c) - 1
    d[c] = max(d.get(c, 0), d.pop(prev_c, 0) + 1)
  c, l = max(d.items(), key=lambda i: i[1])
  return ''.join(map(chr, range(c-l+1, c+1)))