Question

我最近尝试制作一个正则表达式，以删除彼此相邻但不会被其他字符串打断的字符串，然后只保留一个字符串。到目前为止，我的工作是：https://regex101.com/r/Cs0bmY/7。它应该与可能没有www的所有可能的网址一起工作。在它们或其他结尾（如.com或.nl等）之前字符串（URL列表）如下：

operator.livrareflori.md
operator.livrareflori.md
operator.livrareflori.md
operator.livrareflori.md
operator.livrareflori.md
operator.livrareflori.md
operator.livrareflori.md
operator.livrareflori.md
operator.livrareflori.md
operator.livrareflori.md
operator.livrareflori.md
operator.livrareflori.md
amazon.de
fonts.gstatic.com
fonts.gstatic.com
fonts.gstatic.com
erovoyeurism.net
tugtechnologyandbusiness.com

最终结果应如下所示：

operator.livrareflori.md
amazon.de
fonts.gstatic.com
erovoyeurism.net
tugtechnologyandbusiness.com

您会看到没有被其他字符串打断的重复字符串消失了，仅保留了1个结果。

Answer 1

您可以匹配

^(.+)$(?:\n\1)+

因此捕获第一行，并匹配后续重复的行，然后将匹配的所有内容替换为第一捕获组：

\1

（或在您处于任何环境中的第一组的等效关键字）

https://regex101.com/r/Cs0bmY/8

Answer 2

使用记事本++，您可以执行以下操作：

Ctrl + H
查找内容：^(.+)$(?:\R\1)+
替换为：$1
检查环绕
检查正则表达式
请勿检查. matches newline
全部替换

说明：

^(.+)$      : group 1, a whole line
(?:         : non capture group
    \R      : any kind of line break
    \1      : backreference to group 1
)+          : group must appear 1 or more times

替换：

$1          : content of group 1

给定示例的结果

operator.livrareflori.md
amazon.de
fonts.gstatic.com
erovoyeurism.net
tugtechnologyandbusiness.com

Answer 3

诀窍是捕获该行，并使用前瞻性来验证该行稍后在主题中是否存在。此表达式匹配重复项，并用“”替换使它保留最后出现的位置：

Microsoft.CSharp.RuntimeBinder.RuntimeBinderException: 'object' does not contain a definition for 'PropertyTypeAlias'

https://regex101.com/r/Cs0bmY/10

Answer 4

((?:https?://)?(?:www\.)?\S+\.\S+)\s(?=[\s\S]*\1)

您可以尝试一下。请参见演示。

https://regex101.com/r/Cs0bmY/11

正则表达式，删除重复的不间断字符串

4 个答案: