Question

我正在使用Python 3.4。

我有一个这样的日志文件：

10001 ...
10002 * SMTP *
10003 skip me
10004 read me
10005 read me

该脚本的目标是以反向模式打开文件，并迭代直到我发现的行＆＃39; * SMTP *＆＃39;在其中（在示例中是行10002），然后我必须返回并跳过一行并读取接下来的两行（在示例行10004和10005中）。

我该怎么做？

Answer 1

mmap是一个很好的方法：

import mmap

SEARCH_TEXT = b'* SMTP *'
SKIP_LINES = 2
KEEP_LINES = 2

with open('log.txt', 'rb') as f:
    log = mmap.mmap(f.fileno(), 0, prot=mmap.PROT_READ)
    n = log.rfind(SEARCH_TEXT)
    if n == -1:
        print('{!r} not found'.format(SEARCH_TEXT))
    else:
        log.seek(n)
        for i in range(SKIP_LINES):
            _ = log.readline()

        print(''.join(log.readline().decode() for _ in range(KEEP_LINES)))

<强>输出

10004 read me
10005 read me

此代码对日志文件进行了映射，使用rfind()从文件末尾搜索目标字符串'* SMTP *'。然后它将文件指针定位在目标字符串上（使用seek()），消耗2条不需要的行，最后读取2条感兴趣的行。

mmap是有效的，因为操作系统处理从磁盘到应用程序内存的数据分页。它不会读取整个文件，因此对于大文件来说这是一个很好的策略。

Answer 2

with open (file) as textfile:
    lines = textfile.read().split("\n")
    lines.reverse()
    if lines.index("* SMTP *"):
        Ind =  int(lines.index("* SMTP *"))
        print lines[Ind-2]
        print lines[Ind-3]
        break

这只会在您的日志文件中找到最新的* SMTP *，并且它很脏，但可以完成工作。

做了一些测试来比较@mhawke mmap和我的。测试在150k线路测试文件上完成。

<强> MMAP

real 0m0.024s
user 0m0.016s
sys  0m0.008s

我的解决方案

real 0m0.038s
user 0m0.026s
sys  0m0.012s

Python：反向读取文本文件后遍历行

2 个答案: