好几天以来,我一直在为这个OOM错误scratch之以鼻,我是Keras的新手。我尝试采样数据,减小批次大小并从3D-Unet删除图层,但没有任何效果。我正在使用1010例患者的CT扫描的LIDC IDRI数据集。预处理后,我将64x64x64形状的体积保存在磁盘上,该磁盘是从重新采样的256x256x256整个CT扫描中提取的(这是因为起初我首先尝试训练整个CT扫描,但是在获得OOM之后,我决定使用64立方尺寸形状)。每个患者都有64个64x64x64形状,总共需要64,640个样本,而我必须在这些样本上训练我的3D-Unet。
这是我的模型的Keras代码:
im_width = 64
im_height = 64
im_depth = 64
path_train = 'D:/LIDC-IDRI-Dataset/'
def npz_volume_generator(inputPath, bs, mode="train", aug=None):
batch_start_index = 0
patients = os.listdir(inputPath + "images")
# loop indefinitely
while True:
# initialize our batches of scans and masks
scan_pixels = []
mask_pixels = []
# keep looping until we reach our batch size
for id_ in range(batch_start_index, batch_start_index+bs):
# attempt to read the next sample from path
scan_pixel = np.zeros((im_depth, im_width, im_height))
scan_pixel = np.load(inputPath + 'images/' + patients[id_])['arr_0']
mask_pixel = np.zeros((im_depth, im_width, im_height))
mask_pixel = np.load(inputPath + 'masks/' + patients[id_])['arr_0']
# check to see if we have reached the end of our samples
if(batch_start_index >= len(patients)):
# reset the batch start index to the beginning of our samples
batch_start_index -= len(patients)
# if we are evaluating we should now break from our
# loop to ensure we don't continue to fill up the
# batch from samples from the beginning
if mode == "eval":
break
# update our corresponding batch lists
scan_pixels.append(scan_pixel)
mask_pixels.append(mask_pixel)
batch_start_index += bs
if(batch_start_index >= len(patients)):
batch_start_index -= len(patients)
# if the data augmentation object is not None, apply it
if aug is not None:
(scan_pixels, mask_pixels) = next(aug.flow(np.array(scan_pixels),np.array(mask_pixels), batch_size=bs))
#Re-shaping and adding a channel dimension (5D Tensor)
#batch_size, length, breadth, height, channel [None,im_width,im_height,im_depth,1]
#yield the batch to the calling function
yield (np.array(expand_dims(scan_pixels, axis=4)), np.array(expand_dims(mask_pixels, axis=4)))
def conv3d_block(input_tensor, n_filters, kernel_size=3, batchnorm=True):
# first layer
x = Conv3D(filters=n_filters, kernel_size=(kernel_size, kernel_size, kernel_size), kernel_initializer="he_normal",
padding="same")(input_tensor)
if batchnorm:
x = BatchNormalization()(x)
x = Activation("relu")(x)
# second layer
x = Conv3D(filters=n_filters, kernel_size=(kernel_size, kernel_size, kernel_size), kernel_initializer="he_normal",
padding="same")(x)
if batchnorm:
x = BatchNormalization()(x)
x = Activation("relu")(x)
return x
def get_unet(input_img, n_filters=16, dropout=0.5, batchnorm=True):
# contracting path
c1 = conv3d_block(input_img, n_filters=n_filters*1, kernel_size=3, batchnorm=batchnorm)
p1 = MaxPooling3D((2, 2, 2)) (c1)
p1 = Dropout(dropout*0.5)(p1)
c2 = conv3d_block(p1, n_filters=n_filters*2, kernel_size=3, batchnorm=batchnorm)
p2 = MaxPooling3D((2, 2, 2)) (c2)
p2 = Dropout(dropout)(p2)
c3 = conv3d_block(p2, n_filters=n_filters*4, kernel_size=3, batchnorm=batchnorm)
p3 = MaxPooling3D((2, 2, 2)) (c3)
p3 = Dropout(dropout)(p3)
c4 = conv3d_block(p3, n_filters=n_filters*16, kernel_size=3, batchnorm=batchnorm)
# expansive path
u5 = Conv3DTranspose(n_filters*8, (3, 3, 3), strides=(2, 2, 2), padding='same') (c4)
u5 = concatenate([u5, c3])
u5 = Dropout(dropout)(u5)
c5 = conv3d_block(u5, n_filters=n_filters*8, kernel_size=3, batchnorm=batchnorm)
u6 = Conv3DTranspose(n_filters*4, (3, 3, 3), strides=(2, 2, 2), padding='same') (c5)
u6 = concatenate([u6, c2])
u6 = Dropout(dropout)(u6)
c6 = conv3d_block(u6, n_filters=n_filters*4, kernel_size=3, batchnorm=batchnorm)
u7 = Conv3DTranspose(n_filters*2, (3, 3,3), strides=(2, 2, 2), padding='same') (c6)
u7 = concatenate([u7, c1])
u7 = Dropout(dropout)(u7)
c7 = conv3d_block(u7, n_filters=n_filters*2, kernel_size=3, batchnorm=batchnorm)
outputs = Conv3D(1, (1, 1, 1), activation='sigmoid') (c7)
model = Model(inputs=[input_img], outputs=[outputs])
return model
# initialize the number of epochs to train for and batch size
NUM_EPOCHS = 50
BS = 8
# initialize the total number of training and testing image
NUM_TRAIN_IMAGES = len(os.listdir(path_train+ 'images/'))
NUM_TEST_IMAGES = len(os.listdir(path_train+ 'test/'))
# construct the training image generator for data augmentation
aug = ImageDataGenerator(rotation_range=20, zoom_range=0.15,
width_shift_range=0.2, height_shift_range=0.2, shear_range=0.15,
horizontal_flip=True, fill_mode="nearest")
# initialize both the training and testing image generators
trainGen = npz_volume_generator(path_train, BS, mode="train", aug=aug)
testGen = npz_volume_generator(path_train, BS, mode="train", aug=None)
# initialize our Keras model and compile it
model = get_unet(Input((im_depth, im_width, im_height, 1)), n_filters=16, dropout=0.05, batchnorm=True)
print(model.summary())
model.compile(optimizer=Adam(), loss="binary_crossentropy", metrics=["accuracy"])
# train the network
print("[INFO] training w/ generator...")
H = model.fit_generator(trainGen, steps_per_epoch=NUM_TRAIN_IMAGES // BS,
validation_data=testGen, validation_steps=NUM_TEST_IMAGES // BS,
epochs=NUM_EPOCHS)
我得到的输出有两个问题。我得到的第一个警告是:
\Anaconda3\lib\site-packages\keras_preprocessing\image\numpy_array_iterator.py:127: UserWarning: NumpyArrayIterator is set to use the data format convention "channels_last" (channels on axis 3), i.e. expected either 1, 3, or 4 channels on axis 3. However, it was passed an array with shape (8, 64, 64, 64) (64 channels). str(self.x.shape[channels_axis]) + ' channels).')
它声明传递给Keras库的形状是(8、64、64、64)(64个通道),但是我在Keras的Input()函数中声明的输入形状是(64、64、64、1)以1为最后一个轴上的通道,您在这里没有声明批量大小,在我的情况下为8,但是Keras声明传递给它的形状有64个通道,而忽略了我给它的最后一个尺寸。
我得到的第二个错误如下:
ResourceExhaustedError: OOM when allocating tensor with shape[8,32,64,64,64] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[{{node conv3d_transpose_3/conv3d_transpose}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
[[{{node loss/mul}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
我在这里再次遇到形状问题,张量形状。我的形状应该是(8,64,64,64,1),但报告的形状是(8,32,64,64,64),不仅我的通道数量很大,而且我也不知道32的位置来自。张量形状有不同的解释吗?我认为我的输入形状有问题(在不知不觉中将其设置为非常大的情况),并导致了OOM错误。