Question

我正在编写一个程序，它会定期解析Apache日志文件以记录访问者，带宽使用情况等。

问题是，我不想打开日志并解析我已经解析过的数据。例如：

line1
line2
line3

如果我解析该文件，我将保存所有行，然后保存该偏移量。这样，当我再次解析它时，我得到：

line1
line2
line3 - The log will open from this point
line4
line5

第二轮，我会得到第4行和第5行。希望这是有道理的......

我需要知道的是，我该如何做到这一点？ Python有seek（）函数来指定偏移量...所以我只是在解析之后得到日志的文件大小（以字节为单位）然后在第二次记录它时使用它作为偏移量（在seek（）中）？

我似乎无法想到一种对此进行编码的方法＆gt;。＆lt;

Answer 1

由于seek类的tell和file方法，您可以管理文件中的位置 https://docs.python.org/2/tutorial/inputoutput.html

tell方法会告诉您下次打开时的搜索位置

Answer 2

log = open('myfile.log')
pos = open('pos.dat','w')
print log.readline()
pos.write(str(f.tell())
log.close()
pos.close()

log = open('myfile.log')
pos = open('pos.dat')
log.seek(int(pos.readline()))
print log.readline()

当然你不应该那样使用它 - 你应该将操作包装在像save_position(myfile)和load_position(myfile)这样的函数中，但功能就在那里。

Answer 3

如果您的日志文件很容易适合内存（这是合理的轮换政策），您可以轻松执行以下操作：

log_lines = open('logfile','r').readlines()
last_line = get_last_lineprocessed() #From some persistent storage
last_line = parse_log(log_lines[last_line:])
store_last_lineprocessed(last_line)

如果你不能这样做，你可以使用类似的东西（参见接受答案的使用寻求和告诉，以防你需要使用它们）Get last n lines of a file with Python, similar to tail

Answer 4

如果您要解析每行的日志行，则可以从最后一次解析中获取保存行号。你可能会在下次开始从好线开始读它。

当您必须位于文件中非常特定的位置时，寻求更有用。

Answer 5

容易但不推荐:)：

last_line_processed = get_last_line_processed()    
with open('file.log') as log
    for record_number, record in enumerate(log):
        if record_number >= last_line_processed:
            parse_log(record)

Answer 6

请注意，您可以从文件末尾的python中搜索（）：

f.seek(-3, os.SEEK_END)

将读取位置放在EOF的3行。

但是，为什么不在shell中使用diff或使用difflib？

Answer 7

以下是使用你的长度建议和tell方法的代码证明：

beginning="""line1
line2
line3"""

end="""- The log will open from this point
line4
line5"""

openfile= open('log.txt','w')
openfile.write(beginning)
endstarts=openfile.tell()
openfile.close()

open('log.txt','a').write(end)
print open('log.txt').read()

print("\nAgain:")
end2 = open('log.txt','r')
end2.seek(len(beginning))

print end2.read()  ## wrong by two too little because of magic newlines in Windows
end2.seek(endstarts)

print "\nOk in Windows also"
print end2.read()
end2.close()

Answer 8

这是一个高效且安全的代码段，用于保存parallell文件中的偏移读取。基本上是python中的logtail。

with open(filename) as log_fd:
    offset_filename = os.path.join(OFFSET_ROOT_DIR,filename)
    if not os.path.exists(offset_filename):
        os.makedirs(os.path.dirname(offset_filename))
        with open(offset_filename, 'w') as offset_fd:
            offset_fd.write(str(0))
    with open(offset_filename, 'r+') as offset_fd:
        log_fd.seek(int(offset_fd.readline()) or 0)
        new_logrows_handler(log_fd.readlines())
        offset_fd.seek(0)
        offset_fd.write(str(log_fd.tell()))

Python - 如何打开文件并以字节为单位指定偏移量？

8 个答案: