Python压缩字符串不太正确

时间:2018-06-10 15:41:13

标签: python string

我有以下代码在docstring中自我解释。如何让它不用1标记单个字母,从而在最终压缩字符串中将单个数字转换为2?

例如,在docstring中,它变为AAABBBBCDDDD - > A3B4C1D4但是我希望它变成A3B4CD4。我对此很新,所以我们非常感谢任何评论。

class StringCompression(object):
    '''
    Run Length Compression Algorithm: Given a string of letters, such as
    nucleotide sequences, compress it using numbers to flag contiguous repeats.
    Ex: AAABBBBCDDDD -> A3B4C1D4


    >>>x = StringCompression('AAAAbC')
    >>>x.compress()
    'A4bC'
    '''
    def __init__(self, string):
        self.string = string

    def compress(self):
        '''Executes compression on the object.'''
        run = ''
        length = len(self.string)

        if length == 0:
            return ''

        if length == 1:
            return self.string #+ '1'

        last = self.string[0]

        count = 1

        i = 1

        while i < length:

            if self.string[i] == self.string[i - 1]:
                count += 1

            else:
                run = run + self.string[i - 1] + str(count)
                count = 1

            i += 1

        run = (run + self.string[i - 1] + str(count))

        return run

2 个答案:

答案 0 :(得分:2)

这是使用itertools.groupby和生成器的替代解决方案:

from itertools import chain, groupby

x = 'AAABBBBCDDDD'

def compressor(s):
    for i, j in groupby(s):
        size = len(list(j))
        yield (i, '' if size==1 else str(size))

res = ''.join(chain.from_iterable(compressor(x)))

print(res)

A3B4CD4

答案 1 :(得分:0)

现在它按我想要的方式工作。谢谢!

class StringCompression(object):
    '''
    Run Length Compression Algorithm: Given a string of letters, such as
    nucleotide sequences, compress it using numbers to flag contiguous repeats.
    Ex: AAABBBBCDDDD -> A3B4CD4
    Notice that single letter do not get a 1 flag to prevent expansion.

    >>>x = StringCompression('AAAAbC')
    >>>x.compress()
    'A4bC'
    '''
    def __init__(self, string):
        self.string = string

    def compress(self):
        '''Executes compression on the object.'''
        run = ''
        length = len(self.string)

        if length == 0:
            return ''

        if length == 1:
            return self.string #+ '1'

        last = self.string[0]

        count = 1

        i = 1

        while i < length:

            if self.string[i] == self.string[i - 1]:
                count += 1

            else:
                run = run + self.string[i - 1] + str(count)
                count = 1

            i += 1

        run = (run + self.string[i - 1] + str(count))

        compressed_string = ''
        for i in run:
            if i != '1':
                compressed_string += i

        return compressed_string