在记事本++中删除类似的行

时间:2019-07-18 16:23:42

标签: notepad++

我有一个包含225000行的文件,其中包含许多类似的行。我希望删除所有类似的行,而只保留每个“类型”的第一行。示例如下。

我想要一个看起来像这样的文件:

./ACT_HERE_REPORT_MEMO_APPROVED_20180510_083000.log.gz
./ACT_HERE_REPORT_MEMO_APPROVED_20180512_083000.log.gz
./ACT_HERE_REPORT_MEMO_APPROVED_20180513_083000.log.gz
./ACT_HERE_REPORT_MEMO_APPROVED_20180515_083000.log.gz
./ACT_HERE_SOMETHING_MEMO_APPROVED_20180326.xls
./ACT_HERE_SOMETHING_MEMO_APPROVED_20180327.xls
./ACT_HERE_SOMETHING_MEMO_APPROVED_20180328.xls
./ACT_HERE_SOMETHING_MEMO_APPROVED_20180329.xls
./ACT_HERE_SOMETHING_MEMO_APPROVED_20180331.xls
./Archive/20150919-084501.SOMETHING
./Archive/20150922-084501.SOMETHING
./Archive/20150923-084500.SOMETHING
./Archive/20150924-084500.SOMETHING
./TEST/TEST.20170310.20170310-181017.txt.gz
./TEST/TEST.20170310.20170310-201023.txt.gz
./TEST/TEST.20170313.20170313-011035.txt.gz
./TEST/TEST.20170313.20170313-024006.txt.gz
./TEST/TEST.20170313.20170313-041018.txt.gz
./TEST/TEST.20180402-011024.log.gz
./TEST/TEST.20180402-011200.log.gz
./TEST/TEST.20180402-061113.log.gz
./TEST/TEST.20180402-081013.log.gz
./TEST/TEST.20180402-101012.log.gz

要这样结束:

./ACT_HERE_REPORT_MEMO_APPROVED_20180510_083000.log.gz
./ACT_HERE_SOMETHING_MEMO_APPROVED_20180326.xls
./Archive/20150919-084501.SOMETHING
./TEST/TEST.20170310.20170310-181017.txt.gz
./TEST/TEST.20180402-011024.log.gz

1 个答案:

答案 0 :(得分:5)

  • Ctrl + H
  • 查找内容:((^.+?)[-_.\d]+(\..+\R))(?:\2[-_.\d]+\3)+
  • 替换为:$1
  • 检查环绕
  • 检查正则表达式
  • 取消检查. matches newline
  • 全部替换

说明:

(                   # start group 1
  (                 # start group 2
    ^               # beginning of line
    .+?             # 1 or more any character but newline, not greedy
  )                 # end group 2
  [-_.\d]+          # 1 or more hyphen, underscore, dot or digit
  (                 # start group 3
    \.              # a dot
    .+              # 1 or more any character
    \R              # any kind of linebreak
  )                 # end group 3
)                   # end group 1
(?:                 # non capture group
  \2                # backreference to group 2
  [-_.\d]+          # 1 or more hyphen, underscore, dot or digit
  \3                # backreference to group 3
)+                  # end group, must appear 1 or more times

给定示例的结果

./ACT_HERE_REPORT_MEMO_APPROVED_20180510_083000.log.gz
./ACT_HERE_SOMETHING_MEMO_APPROVED_20180326.xls
./Archive/20150919-084501.SOMETHING
./TEST/TEST.20170310.20170310-181017.txt.gz
./TEST/TEST.20180402-011024.log.gz

屏幕截图:

enter image description here