Question

我正在编写一个python脚本来读取两个csv文件。代码段可在下面找到。如果文件包含很少的记录（8,000），代码可以正常工作，但是如果文件包含大量记录（120,000），我在线路上遇到了MemoryError（X_train = X_train.astype('float32')）。

img_lst_train = []
label_lst_train = []

img_lst_test = []
label_lst_test = []

print ('Reading training file')

with open('train.csv') as csvfile:
    readCSV = csv.reader(csvfile, delimiter=',')
    for row in readCSV:
        img = cv2.imread(row[0])
        img_lst_train.append(img) 
        label_lst_train.append(row[1])

print ('Reading testing file')

with open('val.csv') as csvfile:
    readCSV = csv.reader(csvfile, delimiter=',')
    for row in readCSV:
        img = cv2.imread(row[0])
        img_lst_test.append(img) 
        label_lst_test.append(row[1])



img_lst_train = np.array(img_lst_train)
label_lst_train = np.array(label_lst_train)
img_lst_test = np.array(img_lst_test)
label_lst_test = np.array(label_lst_test)


X_train = img_lst_train
y_train = label_lst_train
X_test  = img_lst_test
y_test  = label_lst_test

# Convert class vectors to binary class matrices.
Y_train = np_utils.to_categorical(y_train, nb_classes)
Y_test = np_utils.to_categorical(y_test, nb_classes)


X_train = X_train.astype('float32')
X_test = X_test.astype('float32')

train.csv和val.csv的结构

path to image file, label
path to image file, label
path to image file, label
.........................

如何重写上述代码以避免MemoryError

Answer 1

Numpy的astype函数支持参数copy，如果设置为false，它将在初始数组上工作，而不是生成副本。在代码中：

X_train = X_train.astype('float32', copy=False)
X_test = X_test.astype('float32', copy=False)

如果您在某些时候仍然内存不足，您还可以按顺序而不是同时读取您的列车，验证和测试集。一旦转换为浮动，阵列占用的空间就会减少，这可能会产生不同。

代码重写 - MemoryError

1 个答案: