Question

假设我必须读取一个文件（很大，大约20.000行）。我必须遍历各行并寻找关键字，例如STACKOVERFLOW。找到关键字后，我知道我将必须处理接下来的10行。

当前我正在做的事情：

with open(filepath) as f:
    for line_idx, line in enumerate(f):
        if re.match(my_keyword, line):
            # do something here from line_idx to line_idx + 9
            # can i jump directly to line_idx + 10 ???

有没有一种方法可以在找到关键字后跳过接下来的10行的处理（循环+搜索），并继续循环进行搜索（例如， line_index再加10？

谢谢！

更新

我想补充一点，我想要的是一种不必将文件临时保存到列表中的方式。使用这种方法，我已经有了解决方案。

Answer 1

您可以只使用普通的for循环，而不是for-each循环：

with open(filepath) as f:
    lines = f.readlines()
    for i in range(len(lines)):
        if re.match(my_keyword, lines[i]):
            # do something
            i += 10

它会比当前正在使用的内存更多，因为您要立即将整个文件读入内存。注意事项。

或者，如果将整个文件读入内存是一个问题，则可以一起破解一些东西：

with open(filepath) as f:
    skip = 0
    for line in f:
        if skip <= 0:
            if re.match(my_keyword, line):
                skip = 10
        else:
            skip -= 1
            print(line) # The next ten lines after a match can be processed here

Answer 2

//可能的解决方案可以是

f = open(filepath,"r")
lines = f.readlines()
count = -1
req_lines = []
for line in lines:
    count += 1
    if re.match(my_keyword, line):
        for i in range(10):
            count += 1
            req_lines.append(lines[count])

//现在，您需要的行位于名为“ req_lines”的变量中，您可以对它们执行任何操作。

在python中遍历文件时更改索引

更新

2 个答案: