Question

我想知道是否有一种简单的方法可以在一个巨大的文本文件中运行嵌套搜索字符串？

我有一个文本文件，其中可能包含一行文字以突出显示特定问题区域。我正在研究可能的嵌套搜索，但也避免在整个文本文件上为相关的第二个字符串运行全新的搜索，而是从第一个字符串匹配点继续。

例如，如果在文本文件上运行字符串搜索并找到“问题字符串”，那么我希望运行辅助搜索或最好是继续搜索（从问题行开始）以查找第一个匹配项第二个搜索字符串在我的情况下，第二个搜索字符串将是找到最接近的“GPS INFO”字符串，然后从文本文件中收集GPS信息（即，下一个连续的GPS字符串到第一个“问题字符串”）。

我希望这有道理？！？基本上我想避免对文本文件进行全新搜索，而是继续搜索找到第一个字符串的位置。

我下面有一些代码，但这只是找到第一个字符串，如果我要查找第二个字符串，我通常会开始新的搜索，但这并不能保证我找到下一个连续的字符串。

f = open(file, "r")    
searchlines = f.readlines()
searchstringsProblem = ['BIG Problem Line']
searchstringsGPSLoc = ['GPS INFO']

a = 0
tot = 0
row_num=0 # let it be current row number

while a<len(searchstringsProblem):
    for i, line in enumerate(searchlines):
        for word in searchstringsProblem:
            if word in line:
                prob = line.split()
                worksheet.write(2,0,"Problem ID:", bold) 
                worksheet.write(2,1,prob[5]) 
                break
    a = a+1

以下是GPS INFO系列的示例以及我希望收集的以下统计信息

Key line2 GPS Info

GPS = Active

Longitude = -0.00000

Latitude = +51.47700

感谢您的光临。

MikG

Answer 1

您可以通过逐行遍历文件来跟踪行，而不是将它们全部放在带有.readlines（）的列表中。以下内容可能适合您的需求（它会找到所有问题/ gps对，请注意，如果没有以下gps对，它将不会发现问题）：

文件：

random
random
random
GPS INFO: 238939
random
BIG Problem Line
random
blah GPS INFO: 238490
random GPS INFO: 325236342
BIG Problem Line2
GPS INFO: 12343

代码：

searchstringsProblem = 'BIG Problem Line'
searchstringsGPSLoc = 'GPS INFO'
matches = []

with open("test.txt") as f:
    problem = False
    problem_line = ""

    for line in f:
        if not problem and searchstringsProblem in line:
            problem_line = line.strip()
            problem = True
        elif problem and searchstringsGPSLoc in line:
            matches.append((problem_line, line.strip()))
            problem = False

print matches

这让我们产生了：

[('BIG Problem Line', 'blah GPS INFO: 238490'), ('BIG Problem Line2', 'GPS INFO: 12343')]

如果您想跟踪行号，可以使用枚举迭代这些行，然后将其添加到添加的值中。不知道你想如何存储所有匹配，所以我只是假设一个列表[（问题，gps）]情况。

<小时/> 编辑：每条评论更新支持经度/纬度：

文件：

random
random
random
GPS INFO: 238939
LONGITUDE: 123
LATITUDE: 321
random
BIG Problem Line
random
blah GPS INFO: 238490
LONGITUDE: 456
LATITUDE: 654
random GPS INFO: 325236342
LONGITUDE: 789
LATITUDE: 987
BIG Problem Line2
GPS INFO: 12343
LONGITUDE: 432
LATITUDE: 678

代码：

searchstringsProblem = 'BIG Problem Line'
searchstringsGPSLoc = 'GPS INFO'
matches = []

with open("test.txt") as f:
    problem = False
    problem_line = ""

    for line in f:
        if not problem and searchstringsProblem in line:
            problem_line = line.strip()
            problem = True
        elif problem and searchstringsGPSLoc in line:
            matches.append((problem_line, line.strip(), f.next().strip(), f.next().strip()))
            problem = False

for item in matches:
    print item

输出：

('BIG Problem Line', 'blah GPS INFO: 238490', 'LONGITUDE: 456', 'LATITUDE: 654')
('BIG Problem Line2', 'GPS INFO: 12343', 'LONGITUDE: 432', 'LATITUDE: 678')

<小时/> EDIT2：更新以在查找经度/纬度时忽略空行：

文件：

BIG Problem Line

Key line2 GPS Info

GPS = Active

Longitude = -0.00000

Latitude = +51.47700

代码：

searchstringsProblem = 'BIG Problem Line'
searchstringsGPSLoc = 'GPS Info'
matches = []

with open("test.txt") as f:
    problem = False
    problem_line = ""

    for line in f:
        if not problem and searchstringsProblem in line:
            problem_line = line.strip()
            problem = True
        elif problem and searchstringsGPSLoc in line:
            latitude = ""
            longitude = ""
            for new_line in f:
                if "Longitude" in new_line:
                    longitude = new_line.split("=")[1].strip()
                elif "Latitude" in new_line:
                    latitude = new_line.split("=")[1].strip()
                if longitude and latitude:
                    break;

            if latitude and longitude:
                matches.append((problem_line, line.strip(), latitude, longitude))
                problem = False

for item in matches:
    print item

输出：

('BIG Problem Line', 'Key line2 GPS Info', '+51.47700', '-0.00000')

Answer 2

编辑：添加支持以获取经度和纬度。 EDIT2：正确拆分。

如果要跟踪哪个问题编号（即第一次出现问题，第二次等等）或当前else，您可以添加一些枚举或计数器找到if时会Big Problem Line写一些文字。

def get_text_file(path):
    with open(path, "r") as f:
        searchstrings = ['BIG Problem Line', 'GPS INFO']
        current_string = 0
        for line in f:
            if searchstrings[current_string] in line:
                # That is, if the current index is 1 (you're looking for GPS info)
                if(current_string):
                    long_line = f.next()
                    lat_line = f.next()
                    long_value = long_line.split('=')[1]
                    lat_value = lat_line.split('=')[1]
                    some_write_function(long_value, lat_value) 
                current_string ^= 1 # Flips the bit (0^(1)=1, 1^(1)=0)

使用Python运行嵌套字符串搜索

2 个答案: