用于解析自定义格式的日期的正则表达式逻辑不起作用?

时间:2017-08-30 07:38:32

标签: python regex generator yield

问题: 我从使用自定义日期字段安装的服务解析日志。所以我想匹配日志行,然后查看新日志是否进入日志文件。

但要使用正则表达式来匹配日志文件iam以完全匹配logline中的日期。我附上了下面的代码部分。

代码:

 def matchDate(self , line):
                matchThis = ""
                #Thu Jul 27 00:03:27 2017
                matched = re.match(r'\d\d\d\ \d\d\d \d\d\ \d\d:\d\d:\d\d \d\d\d\d',line)
                print matched
                if matched:
                #matches a date and adds it to matchThis
                        matchThis = matched.group()
                        print 'Match found {}'.format(matchThis)
                else:
                        matchThis = "NONE"
                return matchThis

        def log_parse(self):
                currentDict = {}
                with open(self.default_log , 'r') as f:
                        for line in f:
                                print line
                                if line.startswith(self.matchDate(line) , 0 ,24 ):
                                        if currentDict:
                                                yield currentDict
                                        currentDict = {
                                               "date" : line.split('[')[0][:24],
                                               "no"   : line.split(']')[0][-4:-1],
                                               "type" : line.split(':')[0][-4:-1],
                                               "text" : line.split(':')[1][1:]
                                              }
                                else:
                                        pass
#                                       currentDict['text'] += line
                        yield currentDict

这里没有匹配任何东西,所以我用这个新的正则表达式来修复它

'[A-Za-z]{3} [A-Za-z]{3} [0-9]{2} [0-9]{2}:[0-9]{2}:[0-9]{2} [0-9]{4}'

这是正则表达式编辑器[http://regexr.com/3gl67]

有关如何解决此问题并与日志完全匹配的任何建议。

示例日志:

Wed Aug 30 13:05:47 2017 [3163] INFO: Something new, the something you looking for is hidden. Update finished.
Wed Aug  2 13:05:47 2017 [3163] INFO: Something new, the something you looking for is hidden. Update finished.

enter image description here

1 个答案:

答案 0 :(得分:0)

我开发了这段代码,可以帮助您检测所需的模式:

import re

#detecting Thu Jul 27 00:03:27 2017

line = 'Wed Aug 30 13:05:47 2017 [3163] INFO: Something new, the something you looking for is hidden. Update finished.'

days = '(?:Sat|Sun|Mon|Tue|Wed|Thu|Fri) '
months = '(?:Jan|Feb|Mar|Apr|May|June|July|Aug|Sept|Oct|Nov|Dec) '
day_number = '\d{2} '
time = '\d{1,2}:\d{1,2}:\d{1,2} '
year = '\d{4} '
date = days+months+day_number

pattern = date + time + year

date_matched = re.findall(date, line)
time_matched = re.findall(time, line)
year_matched = re.findall(year, line)
full_matched = re.findall(pattern, line)
print(date_matched, year_matched, time_matched , full_matched)

if len(full_matched) > 0:
  print('yes')
else:
  print('no')

我使用了特定的模式数月,日,年和时间。我对re.match函数不是很熟悉所以我使用了re.findall。我的优先事项是代码的简单性和清晰性,因此我认为可以使用更高效的代码或模式。我真的希望这个可以派上用场。

祝你好运