Python中的反向匹配帮助

时间:2016-04-05 21:11:28

标签: python regex inverse-match

您好我正在修剪McAfee日志文件并删除所有“我没有兴趣”和其他我不感兴趣的报告实例。在我们使用一个利用了-p选项的shell脚本之前,我们正在寻找一个可以在linux和windows上运行的python脚本。经过几次尝试后,我能够在一个在线正则表达式构建器中使用正则表达式,但是我很难将它实现到我的脚本中。 Online REGEX Builder

编辑:我想删除“是OK”,“是一个破碎的”,“是一个块行”,“文件无法打开”行所以然后我只剩下一个文件只是问题我感兴趣的。在shell中有类似的东西:

grep -v "is OK" ${OUTDIR}/${OUTFILE} | grep -v "is a broken" | grep -v "file could not be opened" | grep -v "is a block" > ${OUTDIR}/${OUTFILE}.trimmed 2>&1

我在这里读到并搜索文件:

import re

f2 = open(outFilePath)
contents = f2.read()
print contents
p = re.compile("^((?!(is OK)|(file could not be opened)| (is a broken)|(is a block)))*$", re.MULTILINE | re.DOTALL)
m = p.findall(contents)
print len(m)
for iter in m:
    print iter
f2.close()

我要搜索的文件示例:

eth0
10.0.11.196
00:0C:29:AF:6A:A7
parameters passed to uvscan: --DRIVER /opt/McAfee/uvscan/datfiles/current --    ANALYZE --AFC=32 ATIME-PRESERVE --PLAD --RPTALL RPTOBJECTS SUMMARY --UNZIP -- RECURSIVE --SHOWCOMP --MIME --THREADS=4 /tmp
temp XML output is: /tmp/HIQZRq7t2R
McAfee VirusScan Command Line for Linux64 Version: 6.0.5.614
Copyright (C) 2014 McAfee, Inc.
(408) 988-3832 LICENSED COPY - April 03 2016

AV Engine version: 5700.7163 for Linux64.
Dat set version: 8124 created Apr 3 2016
Scanning for 670707 viruses, trojans and variants.


No file or directory found matching /root/SVN/swd-lhn-build/trunk/utils/ATIME-PRESERVE

No file or directory found matching /root/SVN/swd-lhn-build/trunk/utils/RPTOBJECTS

No file or directory found matching /root/SVN/swd-lhn-build/trunk/utils/SUMMARY
/tmp/tmp.BQshVRSiBo ... is OK.
/tmp/keyring-F6vVGf/socket ... file could not be opened.
/tmp/keyring-F6vVGf/socket.ssh ... file could not be opened.
/tmp/keyring-F6vVGf/socket.pkcs11 ... file could not be opened.
/tmp/yum.log ... is OK.
/tmp/tmp.oW75zGUh4S ... is OK.
/tmp/.X11-unix/X0 ... file could not be opened.
/tmp/tmp.LCZ9Ji6OLs ... is OK.
/tmp/tmp.QdAt1TNQSH ... is OK.
/tmp/ks-script-MqIN9F ... is OK.
/tmp/tmp.mHXPvYeKjb/mcupgrade.conf ... is OK.
/tmp/tmp.mHXPvYeKjb/uvscan/uninstall-uvscan ... is OK.
/tmp/tmp.mHXPvYeKjb/mcscan ... is OK.
/tmp/tmp.mHXPvYeKjb/uvscan/install-uvscan ... is OK.
/tmp/tmp.mHXPvYeKjb/uvscan/readme.txt ... is OK.
/tmp/tmp.mHXPvYeKjb/uvscan/uvscan_secure ... is OK.
/tmp/tmp.mHXPvYeKjb/uvscan/signlic.txt ... is OK.
/tmp/tmp.mHXPvYeKjb/uvscan/uvscan ... is OK.
/tmp/tmp.mHXPvYeKjb/uvscan/liblnxfv.so.4 ... is OK.

但是没有得到正确的输出。我已经尝试删除MULTILINE和DOTALL选项,但仍然没有得到正确的响应。以下是使用DOTALL和MULTILINE运行时的输出。

9
('', '', '', '', '')
('', '', '', '', '')
('', '', '', '', '')
('', '', '', '', '')
('', '', '', '', '')
('', '', '', '', '')
('', '', '', '', '')
('', '', '', '', '')
('', '', '', '', '')

任何帮助将不胜感激!!谢谢!

4 个答案:

答案 0 :(得分:2)

也许一行一行地思考更简单:

import re
import sys

pattern = re.compile(r"(is OK)|(file could not be opened)|(is a broken)|(is a block)")

with open(sys.argv[1]) as handle:
    for line in handle:
        if not pattern.search(line):
            sys.stdout.write(line)

输出:

eth0
10.0.11.196
00:0C:29:AF:6A:A7
parameters passed to uvscan: --DRIVER /opt/McAfee/uvscan/datfiles/current --    ANALYZE --AFC=32 ATIME-PRESERVE --PLAD --RPTALL RPTOBJECTS SUMMARY --UNZIP -- RECURSIVE --SHOWCOMP --MIME --THREADS=4 /tmp
temp XML output is: /tmp/HIQZRq7t2R
McAfee VirusScan Command Line for Linux64 Version: 6.0.5.614
Copyright (C) 2014 McAfee, Inc.
(408) 988-3832 LICENSED COPY - April 03 2016

AV Engine version: 5700.7163 for Linux64.
Dat set version: 8124 created Apr 3 2016
Scanning for 670707 viruses, trojans and variants.


No file or directory found matching /root/SVN/swd-lhn-build/trunk/utils/ATIME-PRESERVE

No file or directory found matching /root/SVN/swd-lhn-build/trunk/utils/RPTOBJECTS

No file or directory found matching /root/SVN/swd-lhn-build/trunk/utils/SUMMARY

答案 1 :(得分:0)

有时正则表达式更复杂,但如果您真的只是在寻找这些模式,那么我可能只是尝试这种简单的方法:

terms = (
    'is OK',
    'file could not be opened',
    'is a broken',
    'is a block',
)

with open('/tmp/sample.log') as f:
    for line in f:
        if line.strip() and not any(term in line for term in terms):
            print(line, end='')

它可能不会比正则表达式更快,但它只是变得如此简单。或者,您也可以使用稍微严格的方法:

terms = (
    'is a broken',
    'is a block',
)

with open('/tmp/samplelog.log') as f:
    for line in f:
        line = line.strip()
        if not line:
            continue
        elif line.endswith('is OK.'):
            continue
        elif line.endswith('file could not be opened.'):
            continue
        elif any(term in line for term in terms):
            continue
        print(line)

我将采取的方法在很大程度上取决于我希望使用该脚本的人:)

答案 2 :(得分:0)

试试这个(并在一行中完成)

p = re.compile("^(?:[if](?!s OK|s a broken|s a block|ile could not be opened)|[^if])*$")

这意味着如果在一行中你有一个" i"或者" f"它不能遵循上面提到的后缀,或者它不是" i"或者" f"那没关系。对于行中的所有字符,它都重复了这一点。

编辑:在regex101.com上测试后,我发现它无法正常工作。这是一行正则表达式。

p = re.compile("^(?:[^if\n]|[if](?!s OK|ile could not be openeds OK|s a broken|s a block|ile could not be opened))*$", re.MULTILINE)

答案 3 :(得分:0)

我知道现在回答已经太迟了。但是我看到没有答案是正确的解决方案。

您在这种情况下的正则表达式是错误的。您有不必要的其他组,句号缺少“”。另外,只有在句子的开头是“是否OK |文件无法打开|坏了”时,它才会匹配。

"hello world is OK": does not match  
"is OK hello world": matches

在反向匹配中,仅使用非捕获组'(?:)'而不是捕获组'()'。这是为了避免获得空字符串。

如果要删除整个句子,可以使用以下表达式:

 r"^(?!.*(?:is OK|is a broken|file could not be opened)).*"
"is OK. hello world": matches  
"hello world is OK.": matches  
"is Ok.": matches

如果要删除整个句子,但仅以“ OK。|无法打开文件。|已损坏。”结尾,则可以使用以下表达式:

r"^(?!.*(?:is OK|is a broken|file could not be opened)\.$).*"
"is OK. hello world" does not match  
"hello world is OK.": matches  
"is Ok.": matches

请记住使用非捕获组'(?:)'而不是捕获组'()',否则您将得到一个空字符串:

                #Capturing group
regex = r"^(?!.*(is OK|file could not be opened|is a broken|is a block)).*"
print(re.findall(regex,text,flags=re.MULTILINE))

输出:

['', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '']

使用join()函数获取全文

                #Non-capturing group
regex = r"^(?!.*(?:is OK|file could not be opened|is a broken|is a block)).*"
print("\n".join(re.findall(regex,text,flags=re.MULTILINE)))

输出:

eth1
10.0.11.196
00:0C:29:AF:6A:A7
parameters passed to uvscan: --DRIVER /opt/McAfee/uvscan/datfiles/current --    ANALYZE --AFC=32 ATIME-PRESERVE --PLAD --RPTALL RPTOBJECTS SUMMARY --UNZIP -- RECURSIVE --SHOWCOMP --MIME --THREADS=4 /tmp
temp XML output is: /tmp/HIQZRq7t2R
McAfee VirusScan Command Line for Linux64 Version: 6.0.5.614
Copyright (C) 2014 McAfee, Inc.
(408) 988-3832 LICENSED COPY - April 03 2016

AV Engine version: 5700.7163 for Linux64.
Dat set version: 8124 created Apr 3 2016
Scanning for 670707 viruses, trojans and variants.


No file or directory found matching /root/SVN/swd-lhn-build/trunk/utils/ATIME-PRESERVE

No file or directory found matching /root/SVN/swd-lhn-build/trunk/utils/RPTOBJECTS

No file or directory found matching /root/SVN/swd-lhn-build/trunk/utils/SUMMARY

Test it