如何在加载大量CSV文件时解决内存错误

时间:2016-04-06 17:30:51

标签: python csv numpy image-processing

我制作了一个CSV文件来存储这样的数据:

img = loadmat('test.mat')
a,b,c = np.shape(img)  # c is the no. of slices
for i in range(c):
    feature_1 = #kurtosis of image slice
    feature_2 = #entropy of image slice
    .
    .
    feature_8 = #skewness of image slice
    train_data = np.concatenate((feature_1,feature_2,...,feature_8),axis=1)
    #this loop will repeat for all the slices and result will be appended in another array
    train_data01.append(train_data)
    #this will give a list with all data in one axis.
feature_img = np.concatenate(train_data01,axis=0)
#this will give a array of size (mx8) where m is the number of pixels in image.
np.savetxt('savedata.csv',feature_img,fmt = '%4.2f') #save teh data in csv file

现在我试图像这样检索数据:

hg01 = []
with open('savedata.csv','rb') as f:
    reader = csv.reader(f)
    for row in reader:
        hg01.append(row)

hg01 = np.concatenate(hg01, axis = 0)
hg01_feature = np.empty(shape=(np.shape(hg01)[0],80), dtype='float32')

for i in range(np.shape(hg01)[0]):
     a = hg01[i]
     b = np.fromstring(a,dtype='float32',sep=' ')
     np.append(np.transpose(hg01_feature[i,:]),b,axis=1)

现在问题是hg01的numpy数组的实际尺寸是1520640x80,大小是~6,4MB。现在,在第6位患者之后,我的系统开始下降并且在加载第7位患者数据时,它显示以下行的内存错误:

hg07 = np.concatenate(hg07, axis = 0)

所以,我有大约40个患者数据以及如何加载这些数据。 我正在使用随机森林进行图像分类。我的系统有8 GB Ram。

1 个答案:

答案 0 :(得分:0)

您的每一行都是一个字符串。我认为你可以通过拆分每个字符串来获得你想要的行为。拆分有点复杂,因为我只是近似正确:每行实际上是一个包含单个字符串的单元素列表,因此您必须使用[0]元素。拆分看起来应该是这样的:

new_feature = [ f[0].split() for f in feature ]

现在在new_feature上做np.shape,希望这会给你想要的东西。