如果单词以4个或更多重复字符开头,我想从句子中删除单词。
eg:
['aaaaaaa is really good', 'nott something great',
'ssssssssssssstackoverflow is a great community']
我需要这样的输出: 例如:
['is really good', 'nott something great', 'is a great community']
我尝试过这样的事情:
^(\S)\1{3,}
它会删除那些重复的字符,但不会删除单词。谢谢
答案 0 :(得分:2)
在模式末尾添加\S*\s
:
words = ['aaaaaaa is really good', 'nott something great','ssssssssssssstackoverflow is a great community']
newWords = [re.sub(r'^(\S)\1{3,}\S*\s', '', word) for word in words]
输出:
['is really good', 'nott something great', 'is a great community']
如果字符串只能由一个单词组成,则将最后一个空格设为\s?
而不是\s
。