Python - 在文本文件中搜索特定的时间范围(sed -n等效)

时间:2017-01-26 20:06:38

标签: python sed

我正在尝试创建一个从日志文件输出特定时间范围的python脚本(类似于下面列出的sed命令):

sed -n '/2017-01-26 18:00/ , /2017-01-26 18:02/p' /logfile.log
2017-01-26 18:00:00
2017-01-26 18:01:01
2017-01-26 18:01:02
2017-01-26 18:01:09
2017-01-26 18:01:09
2017-01-26 18:01:11
2017-01-26 18:02:01 

我的python脚本正在搜索固定的字符串,而不是像上面的sed命令(我怀疑我做错了什么,但我找不到错误 - 请检查下面的代码):

请指出我应该更改代码的位置,并建议代码增强。谢谢!

#!/usr/bin/python
import datetime, time, os, sys, re
from datetime import timedelta
counter = 0
avgtime = 0

now = datetime.datetime.utcnow()
pasttime = now - datetime.timedelta(minutes=5)

timestamp = now.strftime("%y%m%d")
fiveago   = now - timedelta(minutes=5,seconds=now.second)
current   = now.strftime("%Y-%m-%d %H:%M")
pasttime  = fiveago.strftime("%Y-%m-%d %H:%M")
pattern   = str(current + "|" + pasttime)

f = open('/logs/' + sys.argv[1] + '/' + 'u_ex' + timestamp + '.log', 'r')
for line in f:
        if "POST" in line:
                if re.search(pattern, line, re.IGNORECASE):
                        date = line.split(' ')[1]
                        time = line.split(' ')[14]
                        avgtime += int(time)
                        counter += 1
                        print(date,time)
f.close()

print(pattern)
print("Total amount of time: ",counter)
print("Total scan time: ",avgtime)
print("Average scan time: ",avgtime / counter)

3 个答案:

答案 0 :(得分:0)

我没有看到问题是什么,但你要求sed等同于你的命令,所以这里是精确的转换为python:

import sys, re
use = False
for line in open('/logfile.log'):
   if re.search('2017-01-26 18:00', line): use = True
   if use: sys.stdout.write(line)
   if re.search('2017-01-26 18:02', line): use = False

答案 1 :(得分:0)

IIUC,您需要通过时间戳之间的日志来确定。

import datetime, time, os, sys, re
from datetime import timedelta
counter = 0
avgtime = 0

now = datetime.datetime.utcnow()
pasttime = now - datetime.timedelta(minutes=100000)

timestamp = now.strftime("%y%m%d")
fiveago   = now - timedelta(minutes=5,seconds=now.second)
current   = now.strftime("%Y-%m-%d %H:%M")
pasttime  = fiveago.strftime("%Y-%m-%d %H:%M")
pattern   = str(current + "|" + pasttime)

print "Start time: ", pasttime ,"End time: ",current ,"\n\n"

filename ='/logs/' + sys.argv[1] + '/' + 'u_ex' + timestamp + '.log'
with open(filename, 'r') as f:
    contents = f.readlines()
for line in contents:
    if "POST" in line:
        date = line.split(' ')[1]
        time = line.split(' ')[14]
        logdatetime=date+" "+time

        if logdatetime <= current and logdatetime >= pasttime:
            print "yes, within the interval : " ,logdatetime

输出

Start time:  2017-01-26 20:23 End time:  2017-01-26 20:28 


yes, within the interval :  2017-01-26 20:23:20
yes, within the interval :  2017-01-26 20:23:01
yes, within the interval :  2017-01-26 20:23:02

用于此

的输入
POST 2017-01-26 20:23:20 XX
POST 2017-01-26 20:23:01 XC
POST 2017-01-26 20:23:02 CV
POST 2017-01-26 20:20:09 DAF
POST 2017-01-26 20:20:09 fASF
POST 2017-01-26 20:20:11 Sfas
POST 2017-01-26 20:20:01 fsAf
POST 2017-01-26 20:20:02 asf
POST 2017-01-26 20:20:03 asf

答案 2 :(得分:0)

您的解决方案的问题在于您只查找两个&#34;边缘时间&#34;。在您的3分钟时间范围示例中,这是18:0018:02

sed命令的作用是:

sed -n '/2017-01-26 18:00/ , /2017-01-26 18:02/p' /logfile.log
  1. 遍历这些行而不打印(-n
  2. 每当sed找到2017-01-26 18:00时,它就会开始打印所有行
  3. 每当sed发现2017-01-26 18:02时,它就会停止打印
  4. 在您的示例中,您的正则表达式模式是:

    2017-01-26 18:00|2017-01-26 18:02
    

    只会找到 18:00 18:02。所以,你能做的就是其中之一:

    1. 解析该行的日期并与时间范围进行比较,如Shijos answer
    2. 模仿sed,如theamks answer中所示,但要注意:这只适用于两个&#34;边缘时间戳&#34;存在于文件
    3. pimp你的正则表达式,所以它也搜索中间的时间:

      pattern = "|".join([(now-timedelta(minutes=i)).strftime("%Y-%m-%d %H:%M") for i in range(6)])
      

      这将产生例如:

      '2016-01-26 18:00|2016-01-26 17:59|2016-01-26 17:58|2016-01-26 17:57|2016-01-26 17:56|2016-01-26 17:55'