我已经创建了下面的脚本来比较日志文件中的日期和时间, 基本上脚本的目的是:
它将通过一个日志文件,它将日志行的日期与当前时间戳进行比较。如果任何一个小时的记录行然后是当前时间,它将显示该行。
示例日志行是: 10.x.x.x - - [16 / Jun / 2016:09:28:58 -0300]" POST / xxxxx HTTP / 1.1" 200 444 10.x.x.x. - - [16 / Jun / 2016:09:29:02 -0300]" POST / xxxxx HTTP / 1.1" 200 1483
我得到的错误是:
Current Time 2016-06-16 09:46:55.887691
LastHour 2016-06-16 08:46:55.887701
Traceback (most recent call last):
File "log.py", line 41, in <module>
log_date = datetime.datetime.strptime(match.group(2).rstrip(), "%d/%b/%Y:%H:%M").replace(year=datetime.date.today().year)
File "/usr/lib/python2.7/_strptime.py", line 325, in _strptime
(data_string, format))
ValueError: time data '0/Apr/2016:00:00' does not match format '%d/%b/%Y:%H:%M'
import re
import os
import subprocess
import os
import datetime
LOG_FILE="access_log"
#xxxxxxxx - - [26/Apr/2016:14:38:52 -0300] "xxxxxxx HTTP/1.1" 200 357
get_date = re.compile('(.*)([0-9]+/[A-Z-a-z]+/[0-9]+:[0-9]+:[0-9]+)(.*)')
current_time = datetime.datetime.now()
lastHourTime = datetime.datetime.now() - datetime.timedelta(hours = 1)
print ('Current Time %s' % current_time)
print ('LastHour %s' %lastHourTime)
def _read_log():
with open (LOG_FILE,'r')as f:
content=f.readlines()
return content
if __name__ == '__main__':
log_file=_read_log()
for line in log_file:
#GEt the Date only from the log file Feb 7 07:33:19
match=re.search(get_date,line)
if match:
#Capture only the date field so taht we can compare iet with (current_time and lastHourTime.
#log_date1= match.group(2)
#print log_date1
log_date = datetime.datetime.strptime(match.group(2).rstrip(), "%d/%b/%Y:%H:%M").replace(year=datetime.date.today().year)
#print ('Log Date %s' %log_date)
#Check if log_date is greater then lastHourTime and less then current_time
if log_date < current_time and log_date > lastHourTime :
print "Matching"
print line
else:
print "Not Matching"
print line
&#39;
答案 0 :(得分:0)
问题在于你的正则表达式。
而不是捕获26/Apr/2016:14:38
(在评论代码中的示例中),它会捕获6/Apr/2016:14:38
。正如您所看到的,当日期为10
,20
或30
时,这会引发异常,否则会引入错误。
您可以将正则表达式简化为([0-9]+/[A-Z-a-z]+/[0-9]+:[0-9]+:[0-9]+)
,并将match.group(2)
更改为match.group(1)
。
一个简单的例子:
import re
import datetime
get_date = re.compile(r'([0-9]+/[A-Z-a-z]+/[0-9]+:[0-9]+:[0-9]+)')
line = 'xxxxxxxx - - [26/Apr/2016:14:38:52 -0300] "xxxxxxx HTTP/1.1" 200 357'
match = re.search(get_date, line)
if match:
log_date = datetime.datetime.strptime(match.group(0).rstrip(), "%d/%b/%Y:%H:%M").replace(
year=datetime.date.today().year)
print log_date
>> 2016-04-26 14:38:00