如何从以5开头的日志文件中查找行集,并且必须具有2个时间戳值的开始时间和结束时间的格式(例如:14/05/02 02:30:00)用户使用 RegularExpression ?
输入我需要一个脚本来搜索我的日志文件中的每一行,搜索3个参数:
1)开始时间(由用户输入)例如: 14/05/02 02:30:00
2)结束时间(由用户输入)例如: 14/05/02 02:45:00
3)以数字开头#" 5"
我的日志文件的示例行:
9,14/05/02 02:30:00,1,1,94767539135,94767539135,0,1,172839,0,1,172839,,14/05/02 02:30:00,9477000003,,,,,93,14/05/02 03:30:00,0,0,9477000008,,false,,,,,,,,false,0,5011405020230005756,67000,
5,14/05/02 02:30:00,1,1,94776082043,94776082043,0,1,77100,0,1,77100,,14/05/02 02:30:00,9477000003,,,,,19,14/05/05 02:30:00,0,0,9477000007,9477000003,false,,,,,,,,true,,,0,,5011405020230005752,
11,14/05/02 02:30:00,94776082043,1,9477000051,,,5011405020230005752,
12,14/05/02 02:30:00,true,false,9477000008,413025705057121,,,,5011405020230005748,
3,14/05/02 02:30:00,1,1,94713784377,0,1,1,94771653521,0,1,0713784377,,14/05/02 02:29:48,9477000003,413021500734521,,,,0,14/05/05 02:29:50,,,9477000006,9477000006,,,,,,,,,,,,,0,5011405020229484460,
9,14/05/02 02:30:00,1,1,94771969046,94771969046,0,1,776236,0,1,776236,,14/05/02 02:30:00,9477000003,,,,,62,14/05/05 02:30:00,0,0,9477000008,,false,,,,,,,,false,0,5011405020230005763,67000,
5,14/05/02 02:30:00,1,1,94771059909,94771059909,1,1,94776716217,1,1,94776716217,,14/05/02 02:29:57,9477000003,413020776716217,,,,54,14/05/05 02:29:55,0,0,9477000006,9477000047,false,,,,,,,,false,,,0,,5011405020229575408,
这是我尝试的代码的一部分:
#!/usr/bin/env python
import re
count=0
fh = open(r"/home/harzyne/pythonscripts/read_log_file.txt")
yyyy,mo,dd,hh,mm = raw_input("Enter Start_Time in format(yy,mm,dd,hh,mm)").split(',')
yyyy1,mo1,dd1,hh1,mm1 =raw_input("Enter End_Time in format(yy,mm,dd,hh,mm)").split(',')
for i in fh:
if re.search('^5',i):
count +=1
print count
try:
#start_t = datetime(2014,5,2,02,30)
#end_t = datetime(2014,5,2,02,45)
start_t = datetime(int(yyyy),int(mo),int(dd),int(hh),int(mm))
end_t = datetime(int(yyyy1),int(mo1),int(dd1),int(hh1),int(mm1))
diff = end_t - start_t
except ValueError:
print ("invalid arguement")
#start = raw_input("Enter Start_Time in format(yyyy,mm,dd,hh,mm) ")
#end = raw_input("Enter End_Time in format(yyyy,mm,dd,hh,mm)")
no_of_msg_per_sec = float(count)/diff.seconds
print no_of_msg_per_sec
答案 0 :(得分:1)
这是一个如何构建搜索模式并计算行数的示例:
#!/usr/bin/python
import re
s = '''9,14/05/02 02:30:00,1,1,94767539135,94767539135,0,1,172839,0,1,172839...
5,14/05/02 02:30:00,1,1,94776082043,94776082043,0,1,77100,0,1,77100,,14/05/0...
11,14/05/02 02:30:00,94776082043,1,9477000051,,,5011405020230005752,
12,14/05/02 02:30:00,true,false,9477000008,413025705057121,,,,50114050202300...
3,14/05/02 02:30:00,1,1,94713784377,0,1,1,94771653521,0,1,0713784377,,14/05/...
9,14/05/02 02:30:00,1,1,94771969046,94771969046,0,1,776236,0,1,776236,,14/05...
5,14/05/02 02:29:59,1,1,94771059909,94771059909,1,1,94776716217,1,1,94776...'''
start_sb = r'14/05/02 02:29:59'
end_sb = r'14/05/02 02:30:00'
p = re.compile(r'^5,' + end_sb + r',.*\n([\s\S]*?)^5,' + start_sb + r',', re.M)
m = p.search(s)
if (m):
print m.group(1).count("\n")
else
print 'no result'
我们的想法是将所有内容放在捕获组中的开始和结束限制之间,然后计算该组中换行符的数量。
关于模式本身:
.*
将匹配所有字符,直到行的结尾为止
[\s\S]
是一个着名的技巧,可以匹配所有角色,包括换行符
([\s\S]*?)
是捕获组1,它使用延迟量词来抓取所有,直到以5开头的第一行和开始日期时间。
re.M
选项(MULTILINE
)将{{1>}锚点的含义从字符串开始更改为行开始。
答案 1 :(得分:0)
import re
text = '''9,14/05/02 02:30:00,1,1,94767539135,94767539135,0,1,172839,0,1,172839...
5,14/05/02 02:30:00,1,1,94776082043,94776082043,0,1,77100,0,1,77100,,14/05/0...
11,14/05/02 02:30:00,94776082043,1,9477000051,,,5011405020230005752,
12,14/05/02 02:30:00,true,false,9477000008,413025705057121,,,,50114050202300...
3,14/05/02 02:30:00,1,1,94713784377,0,1,1,94771653521,0,1,0713784377,,14/05/...
9,14/05/02 02:30:00,1,1,94771969046,94771969046,0,1,776236,0,1,776236,,14/05...
5,14/05/02 02:29:59,1,1,94771059909,94771059909,1,1,94776716217,1,1,94776...'''
start = r'14/05/02 02:29:59'
end = r'14/05/02 02:30:00'
regex = r'(^5.*(?:' + start + '|' + end + ').*$)'
matches = re.findall(regex, text, re.M)
print matches
这将匹配以下任何行:
因此,count
将是len(matches)
。
输出:
['5,14/05/02 02:30:00,1,1,94776082043,94776082043,0,1,77100,0,1,77100,,14/05/0...',
'5,14/05/02 02:29:59,1,1,94771059909,94771059909,1,1,94776716217,1,1,94776...']