大型csv文件使numpy.genfromtxt出现分段错误

时间:2015-05-01 00:55:02

标签: python csv numpy pickle

我真的想从csv文件创建一个numpy数组,但是,当文件长约50k行时(比如MNIST训练集),我遇到了问题。我尝试导入的文件看起来像这样:

0.0,0.0,0.0,0.5,0.34,0.24,0.0,0.0,0.0
0.0,0.0,0.0,0.4,0.34,0.2,0.34,0.0,0.0
0.0,0.0,0.0,0.34,0.43,0.44,0.0,0.0,0.0
0.0,0.0,0.0,0.23,0.64,0.4,0.0,0.0,0.0

对于10k行长的东西,它可以正常工作,如验证集:

import numpy as np
csv = np.genfromtxt("MNIST_valid_set_data.csv",delimiter = ",")

如果我对训练数据(较大的文件)做同样的事情,我会得到一个c风格的分段错误。有没有人知道除了打破文件然后拼凑它们之外还有什么更好的方法?

最终结果是,我想将数组挑选到类似的mnist.pkl.gz文件中,但如果我无法读取数据,我就无法做到这一点。

非常感谢任何帮助。

0 个答案:

没有答案