我真的想从csv文件创建一个numpy
数组,但是,当文件长约50k行时(比如MNIST训练集),我遇到了问题。我尝试导入的文件看起来像这样:
0.0,0.0,0.0,0.5,0.34,0.24,0.0,0.0,0.0
0.0,0.0,0.0,0.4,0.34,0.2,0.34,0.0,0.0
0.0,0.0,0.0,0.34,0.43,0.44,0.0,0.0,0.0
0.0,0.0,0.0,0.23,0.64,0.4,0.0,0.0,0.0
对于10k行长的东西,它可以正常工作,如验证集:
import numpy as np
csv = np.genfromtxt("MNIST_valid_set_data.csv",delimiter = ",")
如果我对训练数据(较大的文件)做同样的事情,我会得到一个c风格的分段错误。有没有人知道除了打破文件然后拼凑它们之外还有什么更好的方法?
最终结果是,我想将数组挑选到类似的mnist.pkl.gz
文件中,但如果我无法读取数据,我就无法做到这一点。
非常感谢任何帮助。