Question

在下面的示例中尝试删除字符串中的所有重复单词时，检查单词重复1次或多次的正确语法应该是什么。以下示例返回

cat cat in the hat hat hat

它忽略了字符串中的多次重复，只删除了“in”＆amp; “这个”只重复了一次。

>>> re.sub(r'(\b[a-z]+) \1', r'\1', 'cat cat cat in in the the hat hat hat hat hat hat')

Answer 1

这应该打印带有重复项的给定句子

check_for_repeats = 'cat cat cat in in the the hat hat hat hat hat hat'
words = check_for_repeats.split()
sentence_array = []

for i in enumerate(words[:-1]):
    if i[1] != words[i[0] + 1]:
        sentence_array.append(i[1])
if words[-1:] != words[-2:]:
    sentence_array.append(words[-1:][0])

sentence = ' '.join(sentence_array)
print(sentence)

Answer 2

试试这个regex：

(\b[a-z]+)(?: \1)+

我必须做的是将您的\1放入非捕获组，以便我们可以重复1次以上。然后我们可以像你一样替换它：

re.sub(r'(\b[a-z]+)(?: \1)', r'\1', 'cat cat cat in in the the hat hat hat hat hat hat')

Answer 3

试试这个：

re.sub(r'(\b[a-z]+)(?: \1)+', r'\1', 'cat cat cat in in the the hat hat hat hat hat hat')

反向引用后的重复运算符将使其与多次重复匹配。

Answer 4

您可以使用：

re.sub(r'(\b[a-z]+) (?=\1\b)', '', 'cat cat cat in in the the hat hat hat hat hat hat')

Answer 5

当订单不重要时，非正则表达式替代

" ".join(set(string_with_duplicates.split()))

首先按空格分割字符串，将返回的列表转换为一个集合（删除重复项，因为每个元素都是唯一的），然后将这些项目连接回字符串。

>>> string_with_duplicates = 'cat cat cat in in the the hat hat hat hat hat hat'
>>> " ".join(set(string_with_duplicates.split()))
'the in hat cat'

如果需要保留单词的顺序，你可以写这样的东西

>>> unique = []
>>> for w in string_of_duplicates.split():
        if not w in unique:
        unique.append(w)
>>> " ".join(unique)
'cat in the hat'

Python 3x正则表达式语法

5 个答案: