所以我一直试图找出一种方法来使用正则表达式(正则表达式)从我拥有的文本文件中删除重复的电子邮件,但我无法完成任何工作。
这是电子邮件在文本文件中的方式(示例)
examp@asdas.com
kork@kruu.com
gexx@moxx.com
hey@hayhay.cu
examp@asdas.com
geexx@modxx.com
我还没有找到删除所有重复项的方法,我只在正则表达式中找到了一种方法来删除彼此相对的重复项。
有人有任何建议吗?
答案 0 :(得分:0)
怎么样:
搜索:([^@]+@[^@]+)(.*?)\1
替换为:$1$2
正则表达式解释:
The regular expression:
(?-imsx:([^@]+@[^@]+)(.*?)\1)
matches as follows:
NODE EXPLANATION
----------------------------------------------------------------------
(?-imsx: group, but do not capture (case-sensitive)
(with ^ and $ matching normally) (with . not
matching \n) (matching whitespace and #
normally):
----------------------------------------------------------------------
( group and capture to \1:
----------------------------------------------------------------------
[^@]+ any character except: '@' (1 or more
times (matching the most amount
possible))
----------------------------------------------------------------------
@ '@'
----------------------------------------------------------------------
[^@]+ any character except: '@' (1 or more
times (matching the most amount
possible))
----------------------------------------------------------------------
) end of \1
----------------------------------------------------------------------
( group and capture to \2:
----------------------------------------------------------------------
.*? any character except \n (0 or more times
(matching the least amount possible))
----------------------------------------------------------------------
) end of \2
----------------------------------------------------------------------
\1 what was matched by capture \1
----------------------------------------------------------------------
) end of grouping
----------------------------------------------------------------------