一种以相反的顺序逐行读取文本文件的方法?

时间:2019-02-28 18:25:30

标签: python for-loop iterator

我想逐行反读下面给出的文本文件。我不想使用readlines()read()

a.txt

2018/03/25-00:08:48.638553  508     7FF4A8F3D704     snononsonfvnosnovoosr
2018/03/25-10:08:48.985053 346K     7FE9D2D51706     ahelooa afoaona woom
2018/03/25-20:08:50.486601 1.5M     7FE9D3D41706     qojfcmqcacaeia
2018/03/25-24:08:50.980519  16K     7FE9BD1AF707     user: number is 93823004
2018/03/26-00:08:50.981908 1389     7FE9BDC2B707     user 7fb31ecfa700
2018/03/26-10:08:51.066967    0     7FE9BDC91700     Exit Status = 0x0
2018/03/26-15:08:51.066968    1     7FE9BDC91700     std:ZMD:

预期结果:

2018/03/26-15:08:51.066968    1     7FE9BDC91700     std:ZMD:
2018/03/26-10:08:51.066967    0     7FE9BDC91700     Exit Status = 0x0
2018/03/26-00:08:50.981908 1389     7FE9BDC2B707     user 7fb31ecfa700
2018/03/25-24:08:50.980519  16K     7FE9BD1AF707     user: number is 93823004
2018/03/25-20:08:50.486601 1.5M     7FE9D3D41706     qojfcmqcacaeia
2018/03/25-10:08:48.985053 346K     7FE9D2D51706     ahelooa afoaona woom
2018/03/25-00:08:48.638553  508     7FF4A8F3D704     snononsonfvnosnovoosr

我的解决方案:

with open('a.txt') as lines:
    for line in reversed(lines):
        print(line)

3 个答案:

答案 0 :(得分:3)

这是一种无需一次将整个文件都读入内存的方法。它确实需要首先读取整个文件,但仅存储每一行​​的开始位置。一旦知道,它就可以使用seek()方法以任意顺序随机访问每个对象。

以下是使用您的输入文件的示例:

# Preprocess - read whole file and note where lines start.
# (Needs to be done in binary mode.)
with open('text_file.txt', 'rb') as file:
    offsets = [0]  # First line is always at offset 0.
    for line in file:
        offsets.append(file.tell())  # Append where *next* line would start.

# Now reread lines in file in reverse order.
with open('text_file.txt', 'rb') as file:
    for index in reversed(range(len(offsets)-1)):
        file.seek(offsets[index])
        size = offsets[index+1] - offsets[index]  # Difference with next.
        # Read bytes, convert them to a string, and remove whitespace at end.
        line = file.read(size).decode().rstrip()
        print(line)

输出:

2018/03/26-15:08:51.066968    1     7FE9BDC91700     std:ZMD:
2018/03/26-10:08:51.066967    0     7FE9BDC91700     Exit Status = 0x0
2018/03/26-00:08:50.981908 1389     7FE9BDC2B707     user 7fb31ecfa700
2018/03/25-24:08:50.980519  16K     7FE9BD1AF707     user: number is 93823004
2018/03/25-20:08:50.486601 1.5M     7FE9D3D41706     qojfcmqcacaeia
2018/03/25-10:08:48.985053 346K     7FE9D2D51706     ahelooa afoaona woom
2018/03/25-00:08:48.638553  508     7FF4A8F3D704     snononsonfvnosnovoosr

更新

这是一个功能相同的版本,但是使用Python的mmap模块来memory-map个文件,该文件应该可以利用OS /硬件的虚拟内存功能来提供更好的性能。

这是因为正如PyMOTW-3所述:

  

内存映射通常可以提高I / O性能,因为它不涉及每次访问的单独系统调用,并且不需要在缓冲区之间复制数据-内存由内核和用户应用程序直接访问。

代码:

import mmap

with open('text_file.txt', 'rb') as file:
    with mmap.mmap(file.fileno(), length=0, access=mmap.ACCESS_READ) as mm_file:

        # First preprocess the file and note where lines start.
        # (Needs to be done in binary mode.)
        offsets = [0]  # First line is always at offset 0.
        for line in iter(mm_file.readline, b""):
            offsets.append(mm_file.tell())  # Append where *next* line would start.

        # Now process the lines in file in reverse order.
        for index in reversed(range(len(offsets)-1)):
            mm_file.seek(offsets[index])
            size = offsets[index+1] - offsets[index]  # Difference with next.
            # Read bytes, convert them to a string, and remove whitespace at end.
            line = mm_file.read(size).decode().rstrip()
            print(line)

答案 1 :(得分:2)

不,没有更好的方法可以做到这一点。根据定义,文件是某种基本数据类型的顺序组织。文本文件的类型是字符。您正试图在文件上加上其他组织,用换行符分隔字符串。

因此,您必须做的工作是读取文件,重新铸成所需的格式,然后然后以相反的顺序处理该组织。例如,您是否需要多次...将文件读为行,将行存储为数据库记录,然后根据需要遍历记录。

file界面仅向一个方向读取。您可以seek()到另一个位置,但是标准的I / O操作仅在增加位置描述时起作用。

要使解决方案生效,您需要读取整个文件-您不能reverse import 'package:flutter/material.dart'; import 'dart:async'; void main() => runApp(MyApp()); class MyApp extends StatefulWidget { @override State<StatefulWidget> createState() { return new TimerAppState(); } } class TimerAppState extends State<MyApp> { static const duration = const Duration(seconds: 1); int secondsPassed = 0; bool isActive = false; Timer timer; void handleTick() { if(isActive) { setState(() { secondsPassed = secondsPassed + 1; }); } } @override Widget build(BuildContext context) { if (timer == null) timer = Timer.periodic(duration, (Timer t) { handleTick(); }); int seconds = secondsPassed * 60; int minutes = secondsPassed ~/ 60; int hours =secondsPassed ~/ (60 * 60); return MaterialApp( title: 'Flutter', home: Scaffold( appBar: AppBar( title: Text('Flutter Timer'), ), body: Center( child: Column( mainAxisAlignment: MainAxisAlignment.center, children: <Widget>[ Row( mainAxisAlignment: MainAxisAlignment.center, children: <Widget>[ CustomTextContainer(label: 'HRS', value: hours.toString().padLeft(2, '0')), CustomTextContainer(label: 'MIN', value: minutes.toString().padLeft(2, '0')), CustomTextContainer(label: 'SEC', value: seconds.toString().padLeft(2, '0')), ], ), Container( margin:EdgeInsets.only(top: 20), child:RaisedButton( child: Text(isActive ? 'STOP' : 'START'), onPressed: () { setState(() { isActive = !isActive; }); }, ), ), ], ), ), ) ); } } class CustomTextContainer extends StatelessWidget { CustomTextContainer({this.label, this.value}); final String label; final String value; @override Widget build(BuildContext context) { return Container( margin: EdgeInsets.symmetric(horizontal: 5), padding: EdgeInsets.all(20), decoration: new BoxDecoration( borderRadius: new BorderRadius.circular(10), color: Colors.black87, ), child: Column( mainAxisSize: MainAxisSize.min, children: <Widget>[ Text( '$value', style: TextStyle( color: Colors.white, fontSize: 34, fontWeight: FontWeight.bold ), ), Text( '$label', style: TextStyle( color: Colors.white70, ), ) ], ), ); } } 文件描述符的隐式迭代器。

答案 2 :(得分:0)

@martineau的解决方案可以在不将整个文件加载到内存的情况下完成工作,但是仍然浪费了两次读取整个文件的时间。

一种可能更有效的单程方法是从文件末尾以相当大的块读取到缓冲区中,从缓冲区末尾查找下一个换行符(减去最后一个字符的尾随换行符) ),如果找不到,请向后搜索并继续读取大块并将这些块放在缓冲区的前面,直到找到换行符为止。只要在内存限制之内,请使用更大的块大小以提高读取效率:

class ReversedTextReader:
    def __init__(self, file, chunk_size=50):
        self.file = file
        file.seek(0, 2)
        self.position = file.tell()
        self.chunk_size = chunk_size
        self.buffer = ''

    def __iter__(self):
        return self

    def __next__(self):
        if not self.position and not self.buffer:
            raise StopIteration
        chunk = self.buffer
        while True:
            line_start = chunk.rfind('\n', 0, len(chunk) - 1 - (chunk is self.buffer))
            if line_start != -1:
                break
            chunk_size = min(self.chunk_size, self.position)
            self.position -= chunk_size
            self.file.seek(self.position)
            chunk = self.file.read(chunk_size)
            if not chunk:
                line = self.buffer
                self.buffer = ''
                return line
            self.buffer = chunk + self.buffer
        line_start += 1
        line = self.buffer[line_start:]
        self.buffer = self.buffer[:line_start]
        return line

这样:

from io import StringIO

f = StringIO('''2018/03/25-00:08:48.638553  508     7FF4A8F3D704     snononsonfvnosnovoosr
2018/03/25-10:08:48.985053 346K     7FE9D2D51706     ahelooa afoaona woom
2018/03/25-20:08:50.486601 1.5M     7FE9D3D41706     qojfcmqcacaeia
2018/03/25-24:08:50.980519  16K     7FE9BD1AF707     user: number is 93823004
2018/03/26-00:08:50.981908 1389     7FE9BDC2B707     user 7fb31ecfa700
2018/03/26-10:08:51.066967    0     7FE9BDC91700     Exit Status = 0x0
2018/03/26-15:08:51.066968    1     7FE9BDC91700     std:ZMD:
''')

for line in ReversedTextReader(f):
    print(line, end='')

输出:

2018/03/26-15:08:51.066968    1     7FE9BDC91700     std:ZMD:
2018/03/26-10:08:51.066967    0     7FE9BDC91700     Exit Status = 0x0
2018/03/26-00:08:50.981908 1389     7FE9BDC2B707     user 7fb31ecfa700
2018/03/25-24:08:50.980519  16K     7FE9BD1AF707     user: number is 93823004
2018/03/25-20:08:50.486601 1.5M     7FE9D3D41706     qojfcmqcacaeia
2018/03/25-10:08:48.985053 346K     7FE9D2D51706     ahelooa afoaona woom
2018/03/25-00:08:48.638553  508     7FF4A8F3D704     snononsonfvnosnovoosr