我是一个非常新的Python
如果有重复的话,我想改变句子。
更正
现在我正在使用此注册表。但它完全改变了字母。 防爆。 “我的朋友和我很开心” - > “我的朋友很开心”(它删除了“我”和空格)错误
text = re.sub(r'(\w+)\1', r'\1', text) #remove duplicated words in row
如何进行相同的更改,而不是字母,它必须检查单词?
答案 0 :(得分:7)
使用itertools.groupby
的非正则表达式解决方案:
>>> strs = "this is just is is"
>>> from itertools import groupby
>>> " ".join([k for k,v in groupby(strs.split())])
'this is just is'
>>> strs = "this just so so so nice"
>>> " ".join([k for k,v in groupby(strs.split())])
'this just so nice'
答案 1 :(得分:4)
text = re.sub(r'\b(\w+)( \1\b)+', r'\1', text) #remove duplicated words in row
\b
匹配空字符串,但只匹配单词的开头或结尾。
答案 2 :(得分:0)
\b:匹配词边界
\w:任意单词字符
\1:用找到的第二个单词替换匹配项
import re
def Remove_Duplicates(Test_string):
Pattern = r"\b(\w+)(?:\W\1\b)+"
return re.sub(Pattern, r"\1", Test_string, flags=re.IGNORECASE)
Test_string1 = "Good bye bye world world"
Test_string2 = "Ram went went to to his home"
Test_string3 = "Hello hello world world"
print(Remove_Duplicates(Test_string1))
print(Remove_Duplicates(Test_string2))
print(Remove_Duplicates(Test_string3))
结果:
Good bye world
Ram went to his home
Hello world