对于成功的请求,日志文件已编写如下
o 123456789.000 10.10.10.10 3 30 10 -
n A-123456 1452830400 1 1452
n C-73652 1452830400 1 23154
n B-967845 1452830400 1 37451
n G-809573 1452830400 1 92673
o 123456789.000 10.10.10.10 3 30 10 some_random_text_alphanumeric_jdjfdjfdfhdkjfhdkhfdhfdfdfhkdhfkdjfdkjfkdfdkfdkjnc maxbgrsdfuyhlwkjdnkshbvhsgdvsdsjdbskdhskdjoihe73njndedejdoekekdednd
以o
开头并具有some data
(如上所示)的行,并以以o
开头的另一行结尾。这就是模式。
如果有多个此类请求,日志文件将继续追加,如下所示
o 123456789.000 10.10.10.10 3 30 10 -
n A-123456 1452830400 1 1452
n C-73652 1452830400 1 23154
n B-967845 1452830400 1 37451
n G-809573 1452830400 1 92673
o 123456789.000 10.10.10.10 3 30 10 some_random_text_alphanumeric_jdjfdjfdfhdkjfhdkhfdhfdfdfhkdhfkdjfdkjfkdfdkfdkjnc maxbgrsdfuyhlwkjdnkshbvhsgdvsdsjdbskdhskdjoihe73njndedejdoekekdednd
o 123456789.000 10.10.10.10 3 30 10 -
n A-123456 1452830400 1 1452
n C-73652 1452830400 1 23154
n B-967845 1452830400 1 37451
n G-809573 1452830400 1 92673
o 123456789.000 10.10.10.10 3 30 10 some_random_text_alphanumeric_jdjfdjfdfhdkjfhdkhfdhfdfdfhkdhfkdjfdkjfkdfdkfdkjnc maxbgrsdfuyhlwkjdnkshbvhsgdvsdsjdbskdhskdjoihe73njndedejdoekekdednd
文件中可能生成了一些错误的数据,可能适用于任何较早的请求或最新的请求。
如果生成的错误数据不是针对最新请求的,则可以忽略。
示例:
o 123456789.000 10.10.10.10 3 30 10 -
n A-123456 1452830400 1 1452
n C-73652 1452830400 1 23154
n B-967845 1452830400 1 37451
n G-809573 1452830400 1 92673
o 123456789.000 10.10.10.10 3 30 10 some_random_text_alphanumeric_jdjfdjfdfhdkjfhdkhfdhfdfdfhkdhfkdjfdkjfkdfdkfdkjnc maxbgrsdfuyhlwkjdnkshbvhsgdvsdsjdbskdhskdjoihe73njndedejdoekekdednd
o 123456789.000 10.10.10.10 3 30 10 -
n A-123456 1452830400 1 1452
n C-73652 1452830400 1 23154
n B-967845 1452830400 1 37451
n G-809573 1452830400 1 92673
# Should be indication of request i.e., line beginning with o, followed some data
o 123456789.000 10.10.10.10 3 30 10 -
n A-123456 1452830400 1 1452
n C-73652 1452830400 1 23154
n B-967845 1452830400 1 37451
n G-809573 1452830400 1 92673
o 123456789.000 10.10.10.10 3 30 10 some_random_text_alphanumeric_jdjfdjfdfhdkjfhdkhfdhfdfdfhkdhfkdjfdkjfkdfdkfdkjnc maxbgrsdfuyhlwkjdnkshbvhsgdvsdsjdbskdhskdjoihe73njndedejdoekekdednd
如果生成的错误数据是针对最新请求的,则应将其捕获并突出显示。
示例:
o 123456789.000 10.10.10.10 3 30 10 -
n A-123456 1452830400 1 1452
n C-73652 1452830400 1 23154
n B-967845 1452830400 1 37451
n G-809573 1452830400 1 92673
o 123456789.000 10.10.10.10 3 30 10 some_random_text_alphanumeric_jdjfdjfdfhdkjfhdkhfdhfdfdfhkdhfkdjfdkjfkdfdkfdkjnc maxbgrsdfuyhlwkjdnkshbvhsgdvsdsjdbskdhskdjoihe73njndedejdoekekdednd
o 123456789.000 10.10.10.10 3 30 10 -
n A-123456 1452830400 1 1452
n C-73652 1452830400 1 23154
n B-967845 1452830400 1 37451
n G-809573 1452830400 1 92673
o 123456789.000 10.10.10.10 3 30 10 some_random_text_alphanumeric_jdjfdjfdfhdkjfhdkhfdhfdfdfhkdhfkdjfdkjfkdfdkfdkjnc maxbgrsdfuyhlwkjdnkshbvhsgdvsdsjdbskdhskdjoihe73njndedejdoekekdednd
o 123456789.000 10.10.10.10 3 30 10 -
n A-123456 1452830400 1 1452
n C-73652 1452830400 1 23154
n B-967845 1452830400 1 37451
n G-809573 1452830400 1 92673
# No line present i,e., (o 123456789.000 10.10.10.10 3 30 10 some_random_text_alphanumeric_jdjfdjfdfhdkjfhdkhfdhfdfdfhkdhfkdjfdkjfkdfdkfdkjnc maxbgrsdfuyhlwkjdnkshbvhsgdvsdsjdbskdhskdjoihe73njndedejdoekekdednd)
缺失行可以是以o
开头的第一行或以o
开头的最后一行
我需要检查一下,对于每个请求日志都是以这种格式编写的,文件中捕获了多少成功和不成功的请求?
方法1:可以读取文件内容,然后解析,好像行以o等开头,这是我认为不可行的
方法2:我觉得,reg-ex是最佳的最佳解决方案。
哪个最好?您能帮我实现它吗?
到目前为止已尝试:
reg_ex1 = "o\s+\d+(\.\d+)?\d+\s+\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\s+\d+\s+\d+\s+\d+\s+-"
reg_ex2 = "o\s+\d+(\.\d+)?\d+\s+\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\s+\d+\s+\d+\s+\d+\s+[a-zA-Z0-9_]+"
with open(""some_file.log, 'r') as content_file:
content = content_file.read()
pattern1 = re.compile(reg_ex1)
begin_lines = len(pattern1.findall(content))
pattern2 = re.compile(reg_ex2)
end_lines = len(pattern2.findall(content))
if begin_lines == end_lines:
print "File has successful requests captured"
else:
print "File has un-successful requests captured"
# If wrong-data generated is not for latest request, can be ignored.
# If wrong-data generated is for latest request, it should be caught and highlighted.
May be not a good idea though, please let me know.
UPD :
o 123456789.000 10.10.10.10 3 30 10 001-
n A-123456===123 1452830400 1 14521
n C-73652 1452830400 1 231541
n B-967845 1452830400 1 37451
n G-809573==123 1452830400 1 926731
o 123456789.000 10.10.10.10 3 30 10 some_random_text_alphanumeric_jdjfdjfdfhdkjfhdkhfdhfdfdfhkdhfkdjfdkjfkdfdkfdkjnc maxbgrsdfuyhlwkjdnkshbvhsgdvsdsjdbskdhskdjoihe73njndedejdoekekdednd
o 123456789.000 10.10.10.10 3 30 10 002-
n A-123456===456 1452830400 1 14522
n C-73652 1452830400 1 231542
n B-967845 1452830400 1 37452
n G-809573===456 1452830400 1 926732
o 123456789.000 10.10.10.10 3 30 10 003-
n A-123456===789 1452830400 1 14523
n C-73652 1452830400 1 231543
n B-967845 1452830400 1 374513
n G-809573===789 1452830400 1 926733
o 123456789.000 10.10.10.10 3 30 10 some_random_text_alphanumeric_jdjfdjfdfhdkjfhdkhfdhfdfdfhkdhfkdjfdkjfkdfdkfdkjnc maxbgrsdfuyhlwkjdnkshbvhsgdvsdsjdbskdhskdjoihe73njndedejdoekekdednd
对于上面的文本,我想提取数据包1和3。
答案 0 :(得分:2)
要检查文件是good
还是bad
,请考虑文件的第一行和最后一行;
o
开头,则文件不正确o
结尾,则文件不正确o
开始list.txt:
o 123456789.000 10.10.10.10 3 30 10 -
n A-123456 1452830400 1 1452
n C-73652 1452830400 1 23154
n B-967845 1452830400 1 37451
n G-809573 1452830400 1 92673
o 123456789.000 10.10.10.10 3 30 10 some_random_text_alphanumeric_jdjfdjfdfhdkjfhdkhfdhfdfdfhkdhfkdjfdkjfkdfdkfdkjnc maxbgrsdfuyhlwkjdnkshbvhsgdvsdsjdbskdhskdjoihe73njndedejdoekekdednd
o 123456789.000 10.10.10.10 3 30 10 -
n A-123456 1452830400 1 1452
n C-73652 1452830400 1 23154
n B-967845 1452830400 1 37451
n G-809573 1452830400 1 92673
# Should be indication of request i.e., line beginning with o, followed some data
o 123456789.000 10.10.10.10 3 30 10 -
n A-123456 1452830400 1 1452
n C-73652 1452830400 1 23154
n B-967845 1452830400 1 37451
n G-809573 1452830400 1 92673
o 123456789.000 10.10.10.10 3 30 10 some_random_text_alphanumeric_jdjfdjfdfhdkjfhdkhfd
因此:
logFile = "list.txt"
with open(logFile) as f:
content = f.readlines()
# you may also want to remove empty lines
content = [l.strip() for l in content if l.strip()]
for line in content:
if line.startswith("o"): # check if the first line starts with o
if str(content[-1]).strip("[']").split()[0] == 'o': # check if last line starts with o
print("File is good.")
else:
print("File is bad.")
break
else: # end if the first line does not start with o
print("File is bad.")
break
编辑:
要获取在{strong>有效对o
之间的所有响应:
list.txt:
o 123456789.000 10.10.10.10 3 30 10 001-
n A-123456 1452830400 1 14521
n C-73652 1452830400 1 231541
n B-967845 1452830400 1 37451
n G-809573 1452830400 1 926731
o 123456789.000 10.10.10.10 3 30 10 some_random_text_alphanumeric_jdjfdjfdfhdkjfhdkhfdhfdfdfhkdhfkdjfdkjfkdfdkfdkjnc maxbgrsdfuyhlwkjdnkshbvhsgdvsdsjdbskdhskdjoihe73njndedejdoekekdednd
o 123456789.000 10.10.10.10 3 30 10 002-
n A-123456 1452830400 1 14522
n C-73652 1452830400 1 231542
n B-967845 1452830400 1 37452
n G-809573 1452830400 1 926732
o 123456789.000 10.10.10.10 3 30 10 some_random_text_alphanumeric_jdjfdjfdfhdkjfhdkhfdhfdfdfhkdhfkdjfdkjfkdfdkfdkjnc maxbgrsdfuyhlwkjdnkshbvhsgdvsdsjdbskdhskdjoihe73njndedejdoekekdednd
o 123456789.000 10.10.10.10 3 30 10 003-
n A-123456 1452830400 1 14523
n C-73652 1452830400 1 231543
n B-967845 1452830400 1 374513
n G-809573 1452830400 1 926733
因此:
import re
def GetTheResponses(infile):
with open(infile) as fp:
red = fp.read()
for result in re.findall('o (.*?)o ', red, re.S):
print(result)
GetTheResponses('list.txt')
输出:
123456789.000 10.10.10.10 3 30 10 001-
n A-123456 1452830400 1 14521
n C-73652 1452830400 1 231541
n B-967845 1452830400 1 37451
n G-809573 1452830400 1 926731
123456789.000 10.10.10.10 3 30 10 002-
n A-123456 1452830400 1 14522
n C-73652 1452830400 1 231542
n B-967845 1452830400 1 37452
n G-809573 1452830400 1 926732
编辑2 :(以提高可读性):
count = 1
for result in re.findall('o (.*?)o ', red, re.S):
print("Response Packet: {}".format(count))
print("\n".join(result.split("\n")[1:]))
count +=1
输出:
Response Packet: 1
n A-123456 1452830400 1 14521
n C-73652 1452830400 1 231541
n B-967845 1452830400 1 37451
n G-809573 1452830400 1 926731
Response Packet: 2
n A-123456 1452830400 1 14522
n C-73652 1452830400 1 231542
n B-967845 1452830400 1 37452
n G-809573 1452830400 1 926732
答案 1 :(得分:0)
我会明确建议方法1,为什么..? 这样,我们就可以灵活地读取/重复每一行。
with open('file.txt', r) as fp:
line = fp.readline()
print(type(line)) #string
#do anything with line(string)
#1. split_list= fp.split() -- list of values separated by space
#2. Check type of each element:
# split_list[0].isalpha(),
# split_list[0].isalpha(),
# split_list[0].isdigit(),
# split_list[0].isspace() like so, and then do required adding to final dict/list..
确保尝试:捕获,每一步..因为日志文件是不可预测的。
答案 2 :(得分:0)