Question

对于某些关键字的出现，观察不断增长的文件尾部的pythonic方法是什么？

在shell中我可能会说：

tail -f "$file" | grep "$string" | while read hit; do
    #stuff
done

Answer 1

嗯，最简单的方法是不断读取文件，检查新内容并测试命中率。

import time

def watch(fn, words):
    fp = open(fn, 'r')
    while True:
        new = fp.readline()
        # Once all lines are read this just returns ''
        # until the file changes and a new line appears

        if new:
            for word in words:
                if word in new:
                    yield (word, new)
        else:
            time.sleep(0.5)

fn = 'test.py'
words = ['word']
for hit_word, hit_sentence in watch(fn, words):
    print "Found %r in line: %r" % (hit_word, hit_sentence)

如果您知道您的数据将以行显示，则readline此解决方案有效。

如果数据是某种流，则需要一个缓冲区，大于您要查找的最大word，并先填充它。它变得有点复杂......

Answer 2

def tail(f):
    f.seek(0, 2)

    while True:
        line = f.readline()

        if not line:
            time.sleep(0.1)
            continue

        yield line

def process_matches(matchtext):
    while True:
        line = (yield)  
        if matchtext in line:
            do_something_useful() # email alert, etc.


list_of_matches = ['ERROR', 'CRITICAL']
matches = [process_matches(string_match) for string_match in list_of_matches]    

for m in matches: # prime matches
    m.next()

while True:
    auditlog = tail( open(log_file_to_monitor) )
    for line in auditlog:
        for m in matches:
            m.send(line)

我用它来监控日志文件。在完整实现中，我将list_of_matches保存在配置文件中，以便可以将其用于多种用途。我的增强功能列表是对正则表达式的支持，而不是简单的“匹配”。

Answer 3

您可以使用select来轮询文件中的新内容。

def tail(filename, bufsize = 1024):
    fds = [ os.open(filename, os.O_RDONLY) ]
    while True:
        reads, _, _ = select.select(fds, [], [])
        if 0 < len(reads):
            yield os.read(reads[0], bufsize)

Answer 4

你可以使用pytailf：简单的python tail -f wrapper

from tailf import tailf    

for line in tailf("myfile.log"):
    print line

Answer 5

编辑：如下面的评论所述，O_NONBLOCK不适用于磁盘上的文件。如果其他人一直在寻找来自套接字或命名管道或其他进程的尾部数据，这仍然会有所帮助，但无法回答所提出的实际问题。后代的原始答案仍然如下。（调用tail和grep会起作用，但无论如何都不是一种答案。）

使用O_NONBLOCK打开文件并使用select轮询读取可用性，然后read读取新数据和字符串方法以过滤文件末尾的行...或者只使用subprocess模块，让tail和grep为您完成工作，就像在shell中一样。

Answer 6

看起来有一个包装：https://github.com/kasun/python-tail

Answer 7

如果您无法将问题限制为基于行的读取，则需要使用块。

这应该有效：

import sys

needle = "needle"

blocks = []

inf = sys.stdin

if len(sys.argv) == 2:
    inf = open(sys.argv[1])

while True:
    block = inf.read()
    blocks.append(block)
    if len(blocks) >= 2:
        data = "".join((blocks[-2], blocks[-1]))
    else:
        data = blocks[-1]

    # attention, this needs to be changed if you are interested
    # in *all* matches separately, not if there was any match ata all
    if needle in data:
        print "found"
        blocks = []
    blocks[:-2] = []

    if block == "":
        break

挑战在于确保即使针被两个块边界分开也能匹配针。

Answer 8

如果您只是需要一个简单的Python 3解决方案来处理文本文件的行，并且您不需要Windows支持，这对我来说效果很好：

import subprocess
def tailf(filename):
    #returns lines from a file, starting from the beginning
    command = "tail -n +1 -F " + filename
    p = subprocess.Popen(command.split(), stdout=subprocess.PIPE, universal_newlines=True)
    for line in p.stdout:
        yield line
for line in tailf("logfile"):
    #do stuff

它会阻止等待新行的写入，所以这不适合异步使用而无需进行一些修改。

Answer 9

据我所知，Python函数列表中没有等效的“tail”。解决方案是使用tell（）（获取文件大小）和read（）来计算结束行。

这篇博文（不是我）有写出来的功能，看起来对我合适！ http://www.manugarg.com/2007/04/real-tailing-in-python.html

Answer 10

您可以使用collections.deque来实现尾部。

来自http://docs.python.org/library/collections.html#deque-recipes ...

def tail(filename, n=10):
    'Return the last n lines of a file'
    return deque(open(filename), n)

当然，这会读取整个文件内容，但这是一种实现尾部的简洁方法。

如何实现py -onic等价的tail -F？

10 个答案: