Question

我已查看this，this和this。

第三个链接似乎有答案，但它没有完成这项工作。

我无法找到将整个文件带到主内存的解决方案，因为我将使用的文件非常大。所以我决定使用islice，如第3个链接所示。前2个链接无关紧要，因为它们仅用于2行或读取1000个字符。我需要 1000行。 for now N is 1000

我的文件包含 100万行：

样品：

因此，如果我一次阅读1000行，我应该经历while 1000次，但是当我打印p以查看我已经过多少次在通过，它不会停在1000。运行我的程序19038838秒后，它达到了1400！

CODE：

def _parse(pathToFile, N, alg):
    p = 1
    with open(pathToFile) as f:
        while True:
            myList = []
            next_N_lines = islice(f, N)
            if not next_N_lines:
                break
            for line in next_N_lines:
                s = line.split()
                x, y, w = [int(v) for v in s]
                obj = CoresetPoint(x, y)
                Wobj = CoresetWeightedPoint(obj, w)
                myList.append(Wobj)
            a = CoresetPoints(myList)
            client.compressPoints(a) // This line is not the problem
            print(p)
            p = p+1
    c = client.getTotalCoreset()
    return c

我做错了什么？

Answer 1

正如@ Ev.kounis所说，你的while循环似乎没有正常工作。

我建议像以下一样去获取数据块的yield函数：

def get_line():
    with open('your file') as file:
        for i in file:
            yield i

lines_required = 1000
gen = get_line()
chunk = [next(gen) for i in range(lines_required)]

Python - 一次从文件中读取1000行

1 个答案: