解析一个巨大的文本文件,并在匹配后获得下一行和上一行

时间:2015-03-19 16:02:45

标签: python

我有一个大约500 MB的巨大文本文件,我需要打印与输入匹配的行,以及前3行和接下来的3行。

我的文字文件如下:

...
...
...
benz is a nice car
...
...
...
its also said benz is a safe car
...
...
...

如果用户输入'benz',那么它应该在匹配前后打印3行,用于每个匹配。

我的代码: -

users= raw_input('enter the word:')
with open('mytext.txt',rb) as f:
     for line if f:
         if users in line:
            print line(i-3)
            print line
            print line(i+3)

但我没有定义错误

3 个答案:

答案 0 :(得分:2)

使用grep

$ grep -C 3 benz mytext.txt

答案 1 :(得分:1)

我写了一个可能对你的情况有用的小函数:

from collections import deque

def search_cont(filename, search_for, num_before, num_after):
    with open(filename) as f:
        before_lines = deque(maxlen=num_before)
        after_lines = deque(maxlen=num_after+1)
        for _ in range(num_after+1):
            after_lines.append(next(f))
        while len(after_lines)>0:
            current_line = after_lines.popleft()
            if search_for in current_line:
                print("".join(before_lines))
                print(current_line)
                print("".join(after_lines))
                print("-----------------------")
            before_lines.append(current_line)
            try:
                after_lines.append(next(f))
            except StopIteration:
                pass

对于您的示例,您可以将其称为

search_for = raw_input('enter the word:')
search_cont('mytext.txt', search_for, 3, 3)

此解决方案没有文件大小的上限(除非你有很长的行),因为内存中的行数不会超过7行。

答案 2 :(得分:0)

您可以从python中调用grep

import subprocess
result = subprocess.check_output(["grep" "-A" "3" "-B" "3" "benz" "mytext.txt"])