Question

通过一些不同的方法来阅读Python中的文件，我想知道这是最快的方法。

例如，读取文件的最后一行，可以做

input_file = open('mytext.txt', 'r')
lastLine = ""
  for line in input_file:
    lastLine = line

print lastLine # This is the last line

或者

fileHandle = open('mytext.txt', 'r')
lineList = fileHandle.readlines()
print lineList[-1] #This is the last line

我假设对于那个特殊情况，这可能与讨论效率无关......

问题：

1。选择随机线的方法更快

2。我们可以在Python中处理像“SEEK”这样的概念（如果它更快？）

Answer 1

如果您不需要统一分发（即，某些线路被挑选的机会不等于所有线路）和/或如果您的线路长度大致相同那么挑选随机线的问题可以简化为：

以字节为单位确定文件大小
寻找随机位置
搜索最后一个换行符（如果没有前一行可能没有）
选择所有文字到下一个换行符或文件末尾，以先到者为准。

对于（2），你做了一个有根据的猜测，你需要向后搜索多远才能找到上一个换行符。如果您可以确定一行平均为n个字节，那么您可以在一个步骤中读取之前的n个字节。

Answer 2

几天前我遇到了这个问题，我使用这个解决方案。我的解决方案类似于@Frerich Raabe，但没有随机，只是逻辑：）

def get_last_line(f):
    """ f is a file object in read mode, I just extract the algorithm from a bigger function """
    tries = 0
    offs = -512

    while tries < 5:
        # Put the cursor at n*512nth character before the end.
        # If we reach the max fsize, it puts the cursor at the beginning (fsize * -1 means move the cursor of -fsize from the end)
        f.seek(max(fsize * -1, offs), 2)
        lines = f.readlines()
        if len(lines) > 1:   # If there's more than 1 lines found, then we have the last complete line
            return lines[-1]  # Returns the last complete line
        offs *= 2
        tries += 1

    raise ValueError("No end line found, after 5 tries (Your file may has only 1 line or the last line is longer than %s characters)" % offs)

如果文件还有一行（非常长的最后一行），tries计数器将避免被阻塞。该算法尝试从最后512个字符中获取最后一行，然后是1024,2048 ......如果在th迭代中仍然没有完整的行，则停止。

有效地读取文件中的某一行

2 个答案: