Question

可以说我有一个日志文件：output.log，它由一个单独的进程不断更新，可以说是系统中某个地方的Java代码。

现在，我有一个单独的Python进程，该进程读取日志文件以对其进行分析并找出一些数据。我正在使用简单的简单Python代码执行相同的操作：

with open('output.log') as f:
    for line in f:
        # Do something with that line#

问题是我不知道文件更新的频率。如果它是不断更新的文件，Python如何确定何时停止。

程序不是应该无限期地挂起等待数据吗？

在此先感谢您的回答。

Answer 1

生成器可以提供很大的帮助。

# follow.py
#
# Follow a file like tail -f.

import time
import os

def follow(thefile):
    thefile.seek(0, os.SEEK_END)
    while True:
        line = thefile.readline()
        if not line:
            time.sleep(0.1)
            continue
        yield line

# Example use


if __name__ == '__main__':
    logfile = open("run/foo/access-log","r")
    loglines = follow(logfile)
    for line in loglines:
        print(line, end='')

要停止连续分析日志文件，只需在最后一个for循环中稍作停留，就可以了。

您可以在最后一个for循环中对解析的输入数据执行任何操作。

为了更熟悉发电机，我建议阅读 Generator Tricks for Systems Programmers

Answer 2

如果要继续阅读，请使用基于tail -f功能的内容。

>>> df.groupby('A').agg({'B': 'sum'}).values.tolist()
[[43], [23], [14]]

示例取自http://code.activestate.com/recipes/157035-tail-f-in-python/

Answer 3

for循环将一直读取，直到到达文件的当前末尾，然后终止。也许做这样的事情：

#!/usr/bin/env python                                                           
import os                                                                       
import sys                                                                      
import time                                                                     


def process_line(line):                                                         
    print(line.rstrip("\n"))                                                    


def process_file(f):                                                            
    for line in f:                                                              
        process_line(line)                                                      


def tail(path):                                                                 
    old_size = 0                                                                
    pos = 0                                                                     
    while True:                                                                 
        new_size = os.stat(path).st_size                                        
        if new_size > old_size:                                                 
            with open(path, "U") as f:                                          
                f.seek(pos)                                                     
                process_file(f)                                                 
                pos = f.tell()                                                  
            old_size = new_size                                                 
        time.sleep(1)                                                           


if __name__ == "__main__":                                                      
    tail(sys.argv[1])

当然，这是假设文件没有滚动并将其大小重置为零。

从Python中不断更新的文件中读取整个文件数据

3 个答案: