我有以下代码在docstring中自我解释。如何让它不用1标记单个字母,从而在最终压缩字符串中将单个数字转换为2?
例如,在docstring中,它变为AAABBBBCDDDD - > A3B4C1D4但是我希望它变成A3B4CD4。我对此很新,所以我们非常感谢任何评论。
class StringCompression(object):
'''
Run Length Compression Algorithm: Given a string of letters, such as
nucleotide sequences, compress it using numbers to flag contiguous repeats.
Ex: AAABBBBCDDDD -> A3B4C1D4
>>>x = StringCompression('AAAAbC')
>>>x.compress()
'A4bC'
'''
def __init__(self, string):
self.string = string
def compress(self):
'''Executes compression on the object.'''
run = ''
length = len(self.string)
if length == 0:
return ''
if length == 1:
return self.string #+ '1'
last = self.string[0]
count = 1
i = 1
while i < length:
if self.string[i] == self.string[i - 1]:
count += 1
else:
run = run + self.string[i - 1] + str(count)
count = 1
i += 1
run = (run + self.string[i - 1] + str(count))
return run
答案 0 :(得分:2)
这是使用itertools.groupby
和生成器的替代解决方案:
from itertools import chain, groupby
x = 'AAABBBBCDDDD'
def compressor(s):
for i, j in groupby(s):
size = len(list(j))
yield (i, '' if size==1 else str(size))
res = ''.join(chain.from_iterable(compressor(x)))
print(res)
A3B4CD4
答案 1 :(得分:0)
现在它按我想要的方式工作。谢谢!
class StringCompression(object):
'''
Run Length Compression Algorithm: Given a string of letters, such as
nucleotide sequences, compress it using numbers to flag contiguous repeats.
Ex: AAABBBBCDDDD -> A3B4CD4
Notice that single letter do not get a 1 flag to prevent expansion.
>>>x = StringCompression('AAAAbC')
>>>x.compress()
'A4bC'
'''
def __init__(self, string):
self.string = string
def compress(self):
'''Executes compression on the object.'''
run = ''
length = len(self.string)
if length == 0:
return ''
if length == 1:
return self.string #+ '1'
last = self.string[0]
count = 1
i = 1
while i < length:
if self.string[i] == self.string[i - 1]:
count += 1
else:
run = run + self.string[i - 1] + str(count)
count = 1
i += 1
run = (run + self.string[i - 1] + str(count))
compressed_string = ''
for i in run:
if i != '1':
compressed_string += i
return compressed_string