有没有一种方法可以反向使用Python使用open

时间:2019-04-26 13:53:18

标签: python file parsing logging

我正在尝试读取file.out服务器文件,但是我只需要读取日期时间范围内的最新数据。

是否可以使用with open()通过模式(方法)来反向读取文件?

a +模式允许访问文件结尾:

    ``a+''  Open for reading and writing.  The file is created if it does not
      exist. The stream is positioned at the end of the file. Subsequent writes
      to the file will always end up at the then current end of the file, 
      irrespective of any intervening fseek(3) or similar.

是否可以使用a +或其他模式(方法)访问文件末尾并读取特定范围?

由于常规r模式从头开始读取文件

    with open('file.out','r') as file:

尝试使用reversed()

    for line in reversed(list(open('file.out').readlines())):

但它不会为我返回任何行。

或者还有其他方法可以反向读取文件...帮助

编辑

到目前为止,我得到的是:

import os
import time
from datetime import datetime as dt

start_0 = dt.strptime('2019-01-27','%Y-%m-%d')
stop_0 = dt.strptime('2019-01-27','%Y-%m-%d')
start_1 = dt.strptime('09:34:11.057','%H:%M:%S.%f')
stop_1 = dt.strptime('09:59:43.534','%H:%M:%S.%f')

os.system("touch temp_file.txt")
process_start = time.clock()
count = 0
print("reading data...")
for line in reversed(list(open('file.out'))):
    try:
        th = dt.strptime(line.split()[0],'%Y-%m-%d')
        tm = dt.strptime(line.split()[1],'%H:%M:%S.%f')

        if (th == start_0) and (th <= stop_0):
            if (tm > start_1) and (tm < stop_1):
                count += 1
                print("%d occurancies" % (count))
                os.system("echo '"+line.rstrip()+"' >> temp_file.txt")
        if (th == start_0) and (tm < start_1):
            break
    except KeyboardInterrupt:
        print("\nLast line before interrupt:%s" % (str(line)))
        break
    except IndexError as err:
        continue
    except ValueError as err:
        continue
process_finish = time.clock()
print("Done:" + str(process_finish - process_start) + " seconds.")

我添加了这些限制,所以当我找到行时,它至少可以打印出出现的情况,然后停止读取文件。

问题在于它正在读取,但是速度太慢。

编辑2

(2019-04-29 9.34am)

我收到的所有答案都适合于反向阅读日志,但是对于我(可能是其他人)来说,当日志大小为n GB时,下面的Rocky答案最适合我。

对我有用的代码:

(我只为Rocky的代码添加了for循环):

import collections

log_lines = collections.deque()
for line in open("file.out", "r"):
    log_lines.appendleft(line)
    if len(log_lines) > number_of_rows:
        log_lines.pop()

log_lines = list(log_lines)
for line in log_lines:
    print(str(line).split("\n"))

谢谢大家,所有答案都有效。

-lpkej

4 个答案:

答案 0 :(得分:0)

肯定有:

$event = array(
    "Subject" => $subject,
    "Location" => array("DisplayName" => $location),
    "Start"=> array(
            "DateTime" => $startTime,
            "TimeZone" => "Pacific Standard Time"
            ),
    "End"=> array(
            "DateTime" => $endTime,
            "TimeZone" => "Pacific Standard Time"
            ),

    "Body" => array("ContentType" => "HTML", "Content" => $htmlBody)
  );

编辑: 如注释中所述,这会将整个文件读入内存。此解决方案不应与大文件一起使用。

答案 1 :(得分:0)

无法使用open参数来完成此操作,但是如果您想读取大文件的最后一部分而不将该文件加载到内存中(reversed(list(fp))会这样做),则可以使用2次通过解决方案。

LINES_FROM_END = 1000
with open(FILEPATH, "r") as fin:
    s = 0
    while fin.readline(): # fixed typo, readlines() will read everything...
        s += 1
    fin.seek(0)
    mylines = []
    for i, e in enumerate(fin):
        if i >= s - LINES_FROM_END:
            mylines.append(e)

这不会将文件保留在内存中,您也可以使用collections.deque

将其减少到一次
# one pass (a lot faster):
mylines = collections.deque()
for line in open(FILEPATH, "r"):
    mylines.appendleft(line)
    if len(mylines) > LINES_FROM_END:
        mylines.pop()

mylines = list(mylines)
# mylines will contain #LINES_FROM_END count of lines from the end.

答案 2 :(得分:0)

另一种选择是expdp username/password DIRECTORY=directory_object_name DUMPFILE=dumpfile_name TABLES=table_names|TABLESPACES=tablespace_names|FULL=y CONTENT=metadata_only 文件,然后从头开始使用mmap.mmap搜索rfind,然后切掉行。

答案 3 :(得分:0)

嘿m8,我已将此代码制成了对我有用的代码,我可以按相反的顺序读取文件。希望能帮助到你 :) 我首先创建一个新的文本文件,所以我不知道对您来说有多少重要。

def main():
f = open("Textfile.txt", "w+")
for i in range(10):
    f.write("line number %d\r\n" % (i+1))

f.close
def readReversed():
for line in reversed(list(open("Textfile.txt"))):
    print(line.rstrip())

main()
readReversed()