Question

我有一个16GB的文本文档文件，其结构如下：

name1 1 0 1 0 1 1 1 0 0 1 
...
...
nameN 1 1 1 0 1 0 -1 1 0 1

我希望能够对文件进行以下处理：

my_dict = {}

for line in data.split("\n"):
    cells = line.split()
    my_dict[cells[0]] = [int(x) for x in cells[1:]]

问题是，如果我通过以下方式打开文件：

open(data) as f: 
    content = f.readlines()

我明白了：

'list'对象没有属性'split'。

有没有办法打开文件才能进行上述处理？

Answer 1

最好的方法是通过

with open(data) as f:
    for line in f:
        cells = line.strip().split()
        # do something

这样你就不会在内存中多次保存16GB数据（python 2和python 3）。你也应该尽量避免持有完整的词典。

Answer 2

这是因为f.readlines()为您提供了表示行的字符串列表。它已经按"\n"进行拆分，因此当您尝试进行处理时，请执行此操作而不按换行符分割：

my_dict = {}

for line in content:
    cells = line.split()
    my_dict[cells[0]] = [int(x) for x in cells[1:]]