Question

我有一个非常大的文本文件。我想搜索特定单词的最后一次出现，然后对其后面的行执行某些操作。

我可以做类似的事情：

if "word" in line.split():
    do something

我只对"word"的最后一次出现感兴趣。

Answer 1

更简单快捷的解决方案是以相反的顺序打开文件，然后搜索第一个单词位置。

在python 2.6中你可以做类似的事情（其中word是你正在寻找的字符串）

for line in reversed(open("filename").readlines()):
    if word in line:
    # Do the operations here when you find the line

Answer 2

试试这样：

f = open('file.txt', 'r')
lines = f.read()
answer = lines.find('word')

然后你可以从这个

中选择最后一个单词

您也可以使用 str.rfind

str.rfind(sub[, start[, end]])

返回找到substring sub的字符串中的最高索引，这样sub包含在s [start：end]中。可选参数 start和end被解释为切片表示法。返回-1开故障。

Answer 3

您可以打开文件，将其转换为列表，反转其顺序并迭代寻找您的文字。

with open('file.txt','r') as file_:
    line_list = list(file_)
    line_list.reverse()

    for line in line_list:
        if line.find('word') != -1:
            # do something
            print line

您可以选择指定传递缓冲区大小（以字节为单位）的文件缓冲区的大小作为open的第三个参数。例如：with open('file.txt','r', 1024) as file_:

Answer 4

如果文件大小为数百兆字节甚至千兆字节，那么您可能需要使用mmap，这样您就不必将整个文件读入内存。 rfind方法查找文件中最后一次出现的字符串。

import mmap

with open('large_file.txt', 'r') as f:
    # memory-map the file, size 0 means whole file
    m = mmap.mmap(f.fileno(), 0, prot=mmap.PROT_READ)  
                          # prot argument is *nix only

    i = m.rfind('word')   # search for last occurrence of 'word'
    m.seek(i)             # seek to the location
    line = m.readline()   # read to the end of the line
    print line
    nextline = m.readline()

请继续致电readline()以阅读以下内容。

如果文件非常大（如几十千兆字节），那么您可以使用mmap()的长度和偏移参数将其映射为块。 p>

使用python查找大文件中最后一个单词

4 个答案: