Question

我有多个包含文本数据流的文本文件。有些标头根据计数分解数据。问题是我感兴趣的数据块的标题可能在另一个文件中。它看起来像这样......

FILE1.TXT

=======Boot Count 1============
(random text strings)
...
...
...
=======Boot Count 2============
...

FILE2.TXT

...
...
...
=======Boot Count 3============
...
...
=======Boot Count 4============
...

file3.txt

...
...

我需要找到一些位于最新启动计数中的信息。所以我需要

将文本文件连接在一起
向后搜索，直到看到启动计数标题
修剪所有额外的东西
然后只搜索特定字符串的最后一部分。

我可以处理＃4。关于1-3的任何想法？

Answer 1

只需检查每个文件并找到具有最新计数的文件：

from itertools import islice
with open("file1.txt") as f1, open("file2.txt") as f2, open("file3.txt") as f3:
    best_count,index,f_obj = 0,0,None
    import re
    r = re.compile("Boot\s+Count\s+(\d+)")      
    for obj in (f1, f2, f3):
        for ind, line in enumerate(obj,1):
            match = r.search(line)
            if match:
                i = int(match.group())
                if i > best_count:
                    best_count = i
                    index = ind
                    f_obj = obj
    f_obj.seek(0)
    for line in islice(f_obj, index):# search for the string
        print(line)

best_count，index和f_obj将跟踪最新计数所在的位置以及它所在的文件，然后您可以回顾起点并使用itertools.islice从具有最新计数的文件中获取所需的部分。

如果只有带有计数的行始终以if line[0] == "="开头，您还可以使用=来加快搜索速度。

Answer 2

我找到了一种方法来实现我的需要。它与Padraic的方法类似。

def issue(path):
    #path is full path with a wild card character:
    #example: "C:\users\joeShmoe\file*"         

    count = 0
    linenumber = 0
    fileList = []
    fileindex = 1
    bootFound = False

    for name in sorted(glob.glob(path)):
        fileList.append(name)

    for file in fileList:
        if bootFound == True:
            break

        fileindex += 1

        for line in reversed(open(file,'rb').readlines()):
            content = line.rstrip()
            b = re.compile(ur'(BOOT COUNT =)')
            bootCount = re.search(b,content)
            linenumber += 1
            if (bootCount is not None) :
                bootFound = True
                break

    if bootFound == False:
        return None

    filesearch = sorted(fileList[:fileindex],reverse=True)
    lines = [line.strip() for line in fileinput.input(files=filesearch)]

    startpt = len(lines) - linenumber

    if len(lines)  <= 0:
        return None

    if startpt <= 0:
        startpt = 0

    for line in islice(lines,startpt,len(lines)):
            content = line.rstrip()
            p = re.compile(ur'FAILURE HERE')
            failure = re.search(p,content)
            if (failure is not None):
                return 1

    return None

Python 2.7：连接，修剪和搜索文本文件

2 个答案: