我有多个包含文本数据流的文本文件。有些标头根据计数分解数据。问题是我感兴趣的数据块的标题可能在另一个文件中。它看起来像这样......
FILE1.TXT
=======Boot Count 1============
(random text strings)
...
...
...
=======Boot Count 2============
...
FILE2.TXT
...
...
...
=======Boot Count 3============
...
...
=======Boot Count 4============
...
file3.txt
...
...
我需要找到一些位于最新启动计数中的信息。所以我需要
我可以处理#4。关于1-3的任何想法?
答案 0 :(得分:1)
只需检查每个文件并找到具有最新计数的文件:
from itertools import islice
with open("file1.txt") as f1, open("file2.txt") as f2, open("file3.txt") as f3:
best_count,index,f_obj = 0,0,None
import re
r = re.compile("Boot\s+Count\s+(\d+)")
for obj in (f1, f2, f3):
for ind, line in enumerate(obj,1):
match = r.search(line)
if match:
i = int(match.group())
if i > best_count:
best_count = i
index = ind
f_obj = obj
f_obj.seek(0)
for line in islice(f_obj, index):# search for the string
print(line)
best_count
,index
和f_obj
将跟踪最新计数所在的位置以及它所在的文件,然后您可以回顾起点并使用itertools.islice从具有最新计数的文件中获取所需的部分。
如果只有带有计数的行始终以if line[0] == "="
开头,您还可以使用=
来加快搜索速度。
答案 1 :(得分:0)
我找到了一种方法来实现我的需要。它与Padraic的方法类似。
def issue(path):
#path is full path with a wild card character:
#example: "C:\users\joeShmoe\file*"
count = 0
linenumber = 0
fileList = []
fileindex = 1
bootFound = False
for name in sorted(glob.glob(path)):
fileList.append(name)
for file in fileList:
if bootFound == True:
break
fileindex += 1
for line in reversed(open(file,'rb').readlines()):
content = line.rstrip()
b = re.compile(ur'(BOOT COUNT =)')
bootCount = re.search(b,content)
linenumber += 1
if (bootCount is not None) :
bootFound = True
break
if bootFound == False:
return None
filesearch = sorted(fileList[:fileindex],reverse=True)
lines = [line.strip() for line in fileinput.input(files=filesearch)]
startpt = len(lines) - linenumber
if len(lines) <= 0:
return None
if startpt <= 0:
startpt = 0
for line in islice(lines,startpt,len(lines)):
content = line.rstrip()
p = re.compile(ur'FAILURE HERE')
failure = re.search(p,content)
if (failure is not None):
return 1
return None