Question

我正在寻找一种有效的方法来排除一个字符串，以便在第一个字符串之后切断超过2个相同字符的所有序列。

一些输入＆gt;输出示例是：

hellooooooooo -> helloo
woooohhooooo -> woohhoo

我正在循环播放角色，但速度有点慢。有没有人有其他解决方案（正则表达式或其他）

编辑：当前代码：

word_new = ""
        for i in range(0,len(word)-2):    
            if not word[i] == word[i+1] == word[i+2]:
                word_new = word_new+word[i]
        for i in range(len(word)-2,len(word)):
            word_new = word_new + word[i]

Answer 1

编辑：应用有用的评论后

import re

def ReplaceThreeOrMore(s):
    # pattern to look for three or more repetitions of any character, including
    # newlines.
    pattern = re.compile(r"(.)\1{2,}", re.DOTALL) 
    return pattern.sub(r"\1\1", s)

（原始回复） 尝试这样的事情：

import re

# look for a character followed by at least one repetition of itself.
pattern = re.compile(r"(\w)\1+")

# a function to perform the substitution we need:
def repl(matchObj):
   char = matchObj.group(1)
   return "%s%s" % (char, char)

>>> pattern.sub(repl, "Foooooooooootball")
'Football'

Answer 2

以下代码（与其他基于正则表达式的答案不同）完全符合您的要求：将2个以上相同字符的所有序列替换为2个相同的字符。

>>> import re
>>> text = 'the numberr offf\n\n\n\ntheeee beast is 666 ...'
>>> pattern = r'(.)\1{2,}'
>>> repl = r'\1\1'
>>> re.sub(pattern, repl, text, flags=re.DOTALL)
'the numberr off\n\nthee beast is 66 ..'
>>>

您可能不想将此处理应用于以下部分或全部：数字，标点符号，空格，制表符，换行符等。在这种情况下，您需要用更严格的子模式替换.。

例如：

ASCII字母：[A-Za-z]

任何字母，取决于区域设置：[^\W\d_]和re.LOCALE标志

Answer 3

同样使用正则表达式，但没有函数：

import re

expr = r'(.)\1{3,}'
replace_by = r'\1\1'

mystr1 = 'hellooooooo'
print re.sub(expr, replace_by, mystr1)

mystr2 = 'woooohhooooo'
print re.sub(expr, replace_by, mystr2)

Answer 4

我真的不知道python正则表达式，但你可以适应这个：

s/((.)\2)\2+/$1/g;

Answer 5

我发布了我的代码，它不是正则表达式，但是因为你提到了“或其他”......

def removeD(input):
if len(input) < 3: return input

output = input[0:2]
for i in range (2, len(input)):
    if not input[i] == input[i-1] == input[i-2]:
        output += input[i]

return output

不是bgporter的那个（不是开玩笑，我真的比它更喜欢它！）但是 - 至少在我的系统上 - time报告它总是表现得更快。

Python：如何在字符串中剪切超过2个相等字符的序列

5 个答案: