Question

我希望找到有关如何删除具有重复关键字或IP地址的行的答案。例如。

169.146.25.111 1412969662.95 This is just to make it unique
169.146.25.111 1412969662.95 This data doesn't matter
169.146.25.111 1712515362.95 This is all different
169.146.25.112 1412969662.95 Don't care what's here
169.146.25.111 1315125152.95 erroneous information

所以我希望它匹配IP地址，然后搜索以下行，如果它在行的开头找到IP地址，则删除该行。这就是我一直在努力使用的。

Find what:
^(\S+)(.*?)$\s+(?=.*^\1).*?$
Replace With:
\1\2

期望的结果

169.146.25.111 1412969662.95 This is just to make it unique
169.146.25.112 1412969662.95 Don't care what's here

我正在寻找Regex的答案。我知道它可以通过sort或awk轻松完成，但我一直在努力让它与Regex一起工作并且它伤害了我的大脑。谢谢

Answer 1

ip地址的示例，带有全局搜索和空替换字符串（必须取消选中dotall选项）：

^(\S++).*\R(?=(?>.*\R)*?\1 )

模式描述：

^              # start of the line anchor
(\S++)         # captures all non whitespace characters 
               # the possessive quantifier '++' forbids backtracking
.*             # all until the newline character (dotall mode disable)
\R             # a newline (whatever the system \r, \r\n, \n)
(?=            # open a lookahead test
    (?>        # open an atomic group (forbids backtracking once closed)
        .*\R   # a line (with the next newline)
    )*?        # the atomic group may occur zero or more times
    \1         # backreference to the capture group
)              # close the lookahead

Answer 2

基于OP的示例模式和提供的数据，仅适用于连续行

^(\S++)(.*)(?:\R\1.*)+

并替换w / \1\2，必须取消选中dotall选项 Demo

使用Regex基于重复的关键字删除行

2 个答案: