从文本文件中删除特定行

时间:2014-08-05 19:54:48

标签: regex string text

我有一个巨大的日志文件,其中包含许多行:

...
Useful stuff
...
Finished 0 of 435
Finished 1 of 435
...
Finished 435 of 435
...
Other useful stuff

如何优雅地删除除“N的完成N”之外的所有“N的完成n”行?

这应该在Windows上完成,例如Python或GNU工具。

2 个答案:

答案 0 :(得分:2)

您可以使用awk

awk '/^Finished/ && $2!=$4 {next}1' logfile
...
Useful stuff
...
...
Finished 435 of 435
...
Other useful stuff

注意:对于Windows,您可能必须使用双引号而不是单引号。

答案 1 :(得分:2)

您可以尝试使用空字符串替换

^Finished (\d+) of (?!\1)\d+$

Here is DEMO

enter image description here

Debuggex Demo

示例代码:

import re
p = re.compile(ur'^Finished (\d+) of (?!\1)\d+$', re.MULTILINE | re.IGNORECASE)
test_str = u"..."
subst = u""

result = re.sub(p, subst, test_str)

模式说明:

  ^                        the beginning of the string
  Finished                 'Finished '
  (                        group and capture to \1:
    \d+                      digits (0-9) (1 or more times)
  )                        end of \1
   of                      ' of '
  (?!                      look ahead to see if there is not:
    \1                       what was matched by capture \1
  )                        end of look-ahead
  \d+                      digits (0-9) (1 or more times)
  $                        the end of the string

修改

根据下面的评论,正则表达式模式略有改变

^Finished (\d+) of (?!\1$)\d+$

DEMO