Question

我正在跟踪某人的日志文件而且它们完全混乱（没有换行符和分隔符）。所以我做了一些简单的正则表达式来使日志整洁。日志记录#code＃现在在列表中很好地分开，并且它们的字符串在子字典中附加到它。就像这样：

Dict [
    0 : [LOGCODE_53 : 'The string etc etc']
]

由于这很容易，我的目的是直接添加一些日志识别。现在我可以匹配LOGCODE，但问题是代码不是任何东西，并且通常不同的LOGCODE包含相同的输出字符串。

所以我写了一些REGEX匹配来检测日志的内容。我现在的问题是;什么是智慧来检测各种各样的字符串模式？可能有大约110种不同类型的字符串，它们是如此不同，以至于不可能“超匹配”它们。如何在字符串上运行~110个REGEX来查找字符串的意图，从而将它们索引到逻辑寄存器中。

有点像; “拿这个$ STRING并测试这个$ LIST中的所有$ REGEX并让我知道哪些$ REGEX（es）（索引）匹配字符串”。

我的代码：

import re

# Open, Read-out and close; Log file
f = open('000000df.log', "rb")
text = f.read()
f.close()

matches = re.findall(r'00([a-zA-Z0-9]{2})::((?:(?!00[a-zA-Z0-9]{2}::).)+)', text)

print 'Matches: ' + str(len(matches))
print '=========================================================================================='

for match in matches:
    submatching = re.findall(r'(.*?)\'s (.*?) connected (.*?) with ZZZ device (.*?)\.', match[1])

    print match[0] + ' >>> ' + match[1]
    print match[0] + ' >>> ' + submatching[0][0] + ', ' + submatching[0][1] + ',',
    print submatching[0][2] + ', ' + submatching[0][3]

Answer 1

如果特定正则表达式不匹配，

re.match，re.search和re.findall会返回None，因此您可以迭代可能的正则表达式并对其进行测试：< / p>

tests = [
    re.compile(r'...'),
    re.compile(r'...'),
    re.compile(r'...'),
    re.compile(r'...')
]

for test in tests:
    matches = test.findall(your_string):

    if matches:
        print test, 'works'

检查for循环中的字符串是否有多个正则表达式

1 个答案: