datetime.datetime格式不匹配

时间:2016-06-16 12:56:27

标签: python python-2.7

我已经创建了下面的脚本来比较日志文件中的日期和时间, 基本上脚本的目的是:

它将通过一个日志文件,它将日志行的日期与当前时间戳进行比较。如果任何一个小时的记录行然后是当前时间,它将显示该行。

示例日志行是: 10.x.x.x - - [16 / Jun / 2016:09:28:58 -0300]" POST / xxxxx HTTP / 1.1" 200 444 10.x.x.x. - - [16 / Jun / 2016:09:29:02 -0300]" POST / xxxxx HTTP / 1.1" 200 1483

我得到的错误是:

Current Time 2016-06-16 09:46:55.887691
LastHour 2016-06-16 08:46:55.887701
Traceback (most recent call last):
  File "log.py", line 41, in <module>
    log_date = datetime.datetime.strptime(match.group(2).rstrip(), "%d/%b/%Y:%H:%M").replace(year=datetime.date.today().year)
  File "/usr/lib/python2.7/_strptime.py", line 325, in _strptime
    (data_string, format))
ValueError: time data '0/Apr/2016:00:00' does not match format '%d/%b/%Y:%H:%M'


import re
import os
import subprocess
import os
import datetime

LOG_FILE="access_log"

#xxxxxxxx - - [26/Apr/2016:14:38:52 -0300] "xxxxxxx HTTP/1.1" 200 357

get_date = re.compile('(.*)([0-9]+/[A-Z-a-z]+/[0-9]+:[0-9]+:[0-9]+)(.*)')


current_time = datetime.datetime.now()
lastHourTime = datetime.datetime.now() - datetime.timedelta(hours = 1)

print ('Current Time %s' % current_time)
print ('LastHour %s' %lastHourTime)


def _read_log():

        with open (LOG_FILE,'r')as f:
                content=f.readlines()
        return content



if __name__ == '__main__':
        log_file=_read_log()

        for line in log_file:
                #GEt the Date only from the log file Feb  7 07:33:19
                match=re.search(get_date,line)
                if match:
                  #Capture only the date field so taht we can compare iet with (current_time and lastHourTime.
                  #log_date1= match.group(2)
                  #print log_date1
                  log_date = datetime.datetime.strptime(match.group(2).rstrip(), "%d/%b/%Y:%H:%M").replace(year=datetime.date.today().year)

                  #print ('Log Date %s' %log_date)
                  #Check if log_date is greater then lastHourTime and less then current_time
                  if  log_date < current_time and log_date > lastHourTime  :
                        print "Matching"
                        print line
                  else:
                        print "Not Matching"
                        print line

&#39;

1 个答案:

答案 0 :(得分:0)

问题在于你的正则表达式。

而不是捕获26/Apr/2016:14:38(在评论代码中的示例中),它会捕获6/Apr/2016:14:38。正如您所看到的,当日期为102030时,这会引发异常,否则会引入错误。

您可以将正则表达式简化为([0-9]+/[A-Z-a-z]+/[0-9]+:[0-9]+:[0-9]+),并将match.group(2)更改为match.group(1)

一个简单的例子:

import re
import datetime

get_date = re.compile(r'([0-9]+/[A-Z-a-z]+/[0-9]+:[0-9]+:[0-9]+)')

line = 'xxxxxxxx - - [26/Apr/2016:14:38:52 -0300] "xxxxxxx HTTP/1.1" 200 357'

match = re.search(get_date, line)

if match:
    log_date = datetime.datetime.strptime(match.group(0).rstrip(), "%d/%b/%Y:%H:%M").replace(
                year=datetime.date.today().year)
    print log_date
    >> 2016-04-26 14:38:00