Question

嗨，我有一个模糊的问题......

我想构建一个搜索日志文件的工具，我想要以下功能：

1）搜索日志文件，直到找到给定的日志行。 2）找到1）向前跳过未知数量的线，直到满足条件。此时，数据用于进行一些计算。 3）完成后2）我想返回1）中找到的行并继续浏览文件。

现在我能够非常轻松地执行1）和2）循环遍历每一行：

for line in file

for 3）我打算使用像file.seek（linenum）这样的东西并继续迭代。但对于上述任何步骤，是否有更有效的方法？

感谢

Answer 1

对于文件，使用tell和seek：

很容易解决

o=open(myfile)
#read some lines
last_position= o.tell()
#read more lines
o.seek( last_position )
#read more lines again

请注意，与您提及的问题不同， seek不会使用行号。它需要一个字节偏移量。对于ASCII文件，字节偏移也是字符偏移量，但这对于大多数现代编码都不适用。

没有“更有效”的方法，AFAIK。从操作系统，内存，CPU和磁盘的角度来看，这非常有效。从编程的角度来看，它有点笨拙，但不幸的是python没有提供克隆迭代器的标准方法

Answer 2

def read_until_condition(fd, condition, reset, apply=None):
    """
    Returns the position of the file in which the condition functuon is
    true
    :fd : a file descriptor
    :condition (function): a funtion that accepts a line
    :reset (bool): if True then the fd is returned to the initial position
    :apply (func): The function to apply to each line

    Returns:
    int the position of the file in which the condition is True
    """
    pos = None
    current_position = fd.tell()

    while True:
        pos = fd.tell()
        l = fd.readline()

       if l and apply is not None:
           apply(l)

       if not l or condition(l):
           break

    if reset:
        fd.seek(current_position)

    return pos


if __name__ == '__main__':

    f = open('access_log', 'r')
    cf = lambda l: l.startswith('64.242.88.10 - - [07/Mar/2004:16:54:55 -0800]')
    pos = read_until_condition(f, cf, False)
    condition = lambda l: l.startswith('lj1090.inktomisearch.com - - [07/Mar/2004:17:18:41 -0800]')

    def apply(l):
        print l,

    read_until_condition(f, condition, True, apply)

    f.close()

我不确切知道你需要什么，但上面的内容（根据你的需要进行修改）应该有效。

我测试了一些从这里下载的apache日志。

Answer 3

这个答案为大型文件实现了一个高效的基于行的阅读器：https://stackoverflow.com/a/23646049/34088

Python通过文本文件向前和向后迭代

3 个答案: