在多个文件中打开特定行

时间:2016-04-05 14:15:06

标签: python list file-io cursor iteration

我正在尝试打开多个文件的特定行并返回每个文件的行。我的解决方案耗费了大量时间。你有什么建议吗? func.filename:给定文件的名称
func.start_line:给定文件中的起点
func.endline:给定文件中的结束点

def method_open(func):
    try:
        body = open(func.filename).readlines()[func.start_line:
                                               func.end_line]
    except IOError:
        body = []
        stderr.write("\nCouldn't open the referenced method inside {0}".
                     format(func.filename))
        stderr.flush()
    return body

请记住,有时候打开的文件func.filename可能是相同的,但不幸的是,大多数时候情况并非如此。

1 个答案:

答案 0 :(得分:2)

readlines的问题在于它将整个文件读入内存,而linecache也是如此。

您可以通过一次读取一行并在到达func.endline后立即打破循环来节省一些时间

但我找到的最佳方法是使用itertools.islice

这是我在~9701k行的130MB文件上进行的一些测试的结果:

--- 1.43700003624 seconds --- f_readlines
--- 1.00099992752 seconds --- f_enumerate
--- 1.1400001049 seconds --- f_linecache
--- 0.0 seconds --- f_itertools_islice

在这里你可以找到我用过的脚本

import time
import linecache
import itertools


def f_readlines(filename, start_line, endline):
    with open(filename) as f:
        f.readlines()[5000:10000]


def f_enumerate(filename, start_line, endline):
    result = []
    with open(filename) as f:
        for i, line in enumerate(f):
            if i in range(start_line, endline):
                result.append(line)
            if i > endline:
                break


def f_linecache(filename, start_line, endline):
    result = []
    for n in range(start_line, endline):
        result.append(linecache.getline(filename, n))


def f_itertools_islice(filename, start_line, endline):
    result = []
    with open(filename) as f:
        resultt = itertools.islice(f, start_line, endline)
        for i in resultt:
            result.append(i)


def runtest(func_to_test):
    filename = "testlongfile.txt"
    start_line = 5000
    endline = 10000
    start_time = time.time()
    func_to_test(filename, start_line, endline)
    print("--- %s seconds --- %s" % ((time.time() - start_time),func_to_test.__name__))

runtest(f_readlines)
runtest(f_enumerate)
runtest(f_linecache)
runtest(f_itertools_islice)