Question

这个问题早些时候被提出过，但是很久以前。我目前正在尝试打开一个非常大的文件（20GB）来操作。

我正在使用：

read_path = '../text/'
time = 3600
data = open(read_path+'genomes'+str(time)).read().replace(',','\n').replace('\n','')

当我在同一目录（genomes1000）中选择一个较小的文件时，它工作正常，但是当我将时间更改为匹配较大文件的那个时，我得到了错误。

确切的错误消息是：

Tempo:analytics scottjg$ python genomeplot.py 
Traceback (most recent call last):
  File "genomeplot.py", line 27, in <module>
    data = open(read_path+'genomes'+str(time)).read().replace(',','\n').replace('\n','')
OSError: [Errno 22] Invalid argument
Thoughts?

Answer 1

您的代码将文件的总内容读入内存：

open(read_path+'genomes'+str(time)).read()

我怀疑你没有可用的内存来容纳这个，这可能是失败的原因。在循环中调用readline来逐行处理它不是更好吗？

如何在python中处理一个非常大的文件？

1 个答案: