Question

我的字符串中包含相同字符的字符串，例如'1254 ,,,,,,,,,,,,,,,, 982'。我打算做的是用'1254（，16）982'中的某些东西替换它，以便可以重建原始字符串。如果有人能指出我正确的方向，将非常感激

Answer 1

您正在寻找run-length encoding：这是一个基于this one松散的Python实现。

import itertools

def runlength_enc(s):
    '''Return a run-length encoded version of the string'''
    enc = ((x, sum(1 for _ in gp)) for x, gp in itertools.groupby(s))
    removed_1s = [((c, n) if n > 1 else c) for c, n in enc]
    joined = [["".join(g)] if n == 1 else list(g)
                    for n, g in itertools.groupby(removed_1s, key=len)]
    return list(itertools.chain(*joined))

def runlength_decode(enc):
    return "".join((c[0] * c[1] if len(c) == 2 else c) for c in enc)

对于你的例子：

print runlength_enc("1254,,,,,,,,,,,,,,,,982")
# ['1254', (',', 16), '982']
print runlength_decode(runlength_enc("1254,,,,,,,,,,,,,,,,982"))
# 1254,,,,,,,,,,,,,,,,982

（请注意，只有在字符串中有很长的运行时才会有效。）

Answer 2

如果您不关心确切的压缩表单，可以查看zlib.compress和zlib.decompress。 zlib是一个标准的Python库，可以压缩单个字符串，并且可能比自己实现的压缩算法获得更好的压缩。

Answer 3

使用正则表达式：

s = '1254,,,,,,,,,,,,,,,,982'

import re
c = re.sub(r'(.)\1+', lambda m: '(%s%d)' % (m.group(1), len(m.group(0))), s)
print c # 1254(,16)982

使用itertools

import itertools
c = ''
for chr, g in itertools.groupby(s):
    k = len(list(g))
    c += chr if k == 1 else '(%s%d)' % (chr, k)
print c # 1254(,16)982

如何通过删除python中的重复项来压缩？

3 个答案: