Question

我有一个文本日志文件，如下所示：

Line 1 - Date/User Information
Line 2 - Type of LogEvent
Line 3-X, variable number of lines with additional information,
          could be 1, could be hundreds

然后重复序列。

大约有20K行的日志，50多种类型的日志事件，大约。 15K单独的用户/日期事件。我想在Python中解析它并使这些信息可查询。

所以我想我会创建一个LogEvent类来记录用户，日期（我提取并转换为datetime），动作，描述......类似于：


    class LogEvent():
        def __init__(self,date,user):
            self.date = date # string converted to datetime object
            self.user = user
            self.content = ""

每次解析包含用户/日期信息的文本行时，都会创建此类事件。

要添加日志事件信息和任何描述性内容，可能会出现以下情况：


    def classify(self,logevent):
        self.logevent = logevent

    def addContent(self,lineoftext):
        self.content += lineoftext

要处理文本文件，我会使用readline（）并一次执行一行。如果该行是用户/日期，我实例化一个新对象并将其添加到列表...


    newevent = LogEvent(date,user)
    eventlist.append(newevent)

并开始添加操作/内容，直到遇到新对象。


    eventlist[-1].classify(logevent)
    eventlist[-1].addContent(line)

所有这些都是有道理的（除非你说服我有一个更聪明的方法来做它或一个我不知道的有用的Python模块）。在处理可能包含50种以上可能类型的可能日志事件类型的设置列表时，我试图决定如何最好地对日志事件类型进行分类，并且我不想只接受整行text作为日志事件类型。相反，我需要将行的开头与可能的值列表进行比较......

我不想做的是其中的50个：


    if line.startswith("ABC"):
        logevent = "foo"
    if line.startswith("XYZ"):
        logevent = "boo"

我考虑过使用dict作为查找表，但我不确定如何使用＆＃34; startswith＆＃34; ...任何建议都会受到赞赏，如果我太长时间的啰嗦，我会道歉

Answer 1

如果你有一个logEvent类型的字典作为键，你想要以logevent属性作为值，你可以这样做，

logEvents = {"ABC":"foo", "XYZ":"boo", "Data Load":"DLtag"}

并且日志文件中的行是

line = "Data Load: 127 row uploaded"

你可以检查上面的任何一个键是否在行的开头，

for k in logEvents:
    if line.startswith(k): 
        logevent = logEvents[k]

这将循环遍历logEvents中的所有键，并检查line是否以其中一个键开头。 if条件后你可以做任何你喜欢的事。您可以将其放入一个函数中，该函数在解析了包含用户/日期信息的文本行之后调用。如果你想在没有找到钥匙的情况下做某事，你可以这样做，

 for k in logEvents:
    if line.startswith(k): 
        logevent = logEvents[k]
        return
 raise ValueError( "logEvent not recognized.\n line = " + line )

请注意，您提出的确切异常类型并不是非常重要。我选择了一个内置异常来避免子类化。 Here您可以看到所有内置的例外情况。

Answer 2

由于我没有很好地提出我的问题，所以我更多地考虑并提出了这个答案，这与this thread类似。

我想要一个干净，易于管理的解决方案，根据是否满足某些条件，以不同方式处理每行文本。我不想使用一堆if / else子句。所以我尝试将条件和结果（处理）转移到decisionDict = {}。

### RESPONSES WHEN CERTAIN CONDITIONS ARE MET - simple examples
def shorten(line):
    return line[:25]

def abc_replace(line):
    return line.replace("xyz","abc")

### CONDITIONAL CHECKS FOR CONTENTS OF LINES OF TEXT - simple examples
def check_if_string_in_line(line):
    response = False
    if "xyz" in line:
        response = True
    return response

def check_if_longer_than25(line):
    response = False
    if len(line)>25:
        response = True
    return response

### DECISION DICTIONARY - could be extended for any number of condition/response
decisionDict = {check_if_string_in_line:abc_replace, check_if_longer_than25:shorten}

### EXAMPLE LINES OF SILLY TEXT
lines = ["Alert level raised to xyz",
    "user 5 just uploaded duplicate file",
    "there is confusion between xyz and abc"]

for line in lines:
    for k in decisionDict.keys():
        if k(line):#in line:
            print decisionDict[k](line)

这使得所有条件和行动完全分开。它目前还不允许将多个条件应用于任何一行文本。一旦解决为'True'的第一个条件，我们将继续下一行文本。

从文本日志文件生成日志事件对象

2 个答案: