Question

我正在尝试预处理一个很大的.txt文件，大约12GB。
以下代码给出了

无效参数

错误。我认为是因为数据太大了。
有什么办法可以读取这么大的文档？
我需要这些大数据来训练单词以生成单词向量吗？
还是还有其他错误？

with open('data/text8') as f:
    text = f.read()

Answer 1

取决于您打算进行哪种文本处理，也许一次阅读一行就足够了：

f = open("data/text8", "r")
for line in f:
    # process the string 'line' as desired (it's a single line of the document you opened)

f.close()