我正在尝试开发一种图像去噪模型。我一直在阅读如何计算神经网络的内存使用情况,标准方法似乎是:
params = depth_n x (kernel_width x kernel_height) x depth_n-1 + depth
通过将网络中的所有参数加在一起,最终得到 1,038,097 ,大约为 4.2MB 。自Keras最终获得 1,038,497 参数以来,似乎在最后一层进行了一些错误的计算。不过,这是一个很小的差异。 4.2MB只是参数,我已经看到某个地方应该乘以3 以包括反向传播和其他所需的计算。则大约为13MB。
我大约有11 GB的GPU内存可以使用,但是此模型用尽了。所有额外需要的内存从何而来?我想念什么?我知道这篇文章可能会被标记为重复,但是其他人似乎都没有抓住我要问的话题。
我的模特:
def network(self):
weights = RandomUniform(minval=-0.05, maxval=0.05, seed=None)
input_img = Input(shape=(self.img_rows, self.img_cols, self.channels))
conv1 = Conv2D(1024, (3,3), activation='tanh', kernel_initializer=weights,
padding='same', use_bias=True)(input_img)
conv2 = Conv2D(64, (3,3), activation='tanh', kernel_initializer=weights,
padding='same', use_bias=True)(conv1)
conv3 = Conv2D(64, (3,3), activation='tanh', kernel_initializer=weights,
padding='same', use_bias=True)(conv2)
conv4 = Conv2D(64, (3,3), activation='relu', kernel_initializer=weights,
padding='same', use_bias=True)(conv3)
conv5 = Conv2D(64, (7,7), activation='relu', kernel_initializer=weights,
padding='same', use_bias=True)(conv4)
conv6 = Conv2D(64, (5,5), activation='relu', kernel_initializer=weights,
padding='same', use_bias=True)(conv5)
conv7 = Conv2D(32, (5,5), activation='relu', kernel_initializer=weights,
padding='same', use_bias=True)(conv6)
conv8 = Conv2D(32, (3,3), activation='relu', kernel_initializer=weights,
padding='same', use_bias=True)(conv7)
conv9 = Conv2D(16, (3,3), activation='relu', kernel_initializer=weights,
padding='same', use_bias=True)(conv8)
decoded = Conv2D(1, (5,5), kernel_initializer=weights,
padding='same', activation='sigmoid', use_bias=True)(conv8)
return input_img, decoded
def compiler(self):
self.model.compile(optimizer='RMSprop', loss='mse')
self.model.summary()
我认为我的模型在很多方面都是很愚蠢的,并且有很多方面可以改进(辍学,其他过滤器大小和数量,优化器等),并且所有建议都很高兴,但实际问题仍然存在。为什么此模型会消耗大量内存?是因为conv1
的深度太深了吗?
模型摘要:
Using TensorFlow backend.
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_1 (InputLayer) (None, 1751, 480, 1) 0
_________________________________________________________________
conv2d_1 (Conv2D) (None, 1751, 480, 1024) 10240
_________________________________________________________________
conv2d_2 (Conv2D) (None, 1751, 480, 64) 589888
_________________________________________________________________
conv2d_3 (Conv2D) (None, 1751, 480, 64) 36928
_________________________________________________________________
conv2d_4 (Conv2D) (None, 1751, 480, 64) 36928
_________________________________________________________________
conv2d_5 (Conv2D) (None, 1751, 480, 64) 200768
_________________________________________________________________
conv2d_6 (Conv2D) (None, 1751, 480, 64) 102464
_________________________________________________________________
conv2d_7 (Conv2D) (None, 1751, 480, 32) 51232
_________________________________________________________________
conv2d_8 (Conv2D) (None, 1751, 480, 32) 9248
_________________________________________________________________
conv2d_10 (Conv2D) (None, 1751, 480, 1) 801
=================================================================
Total params: 1,038,497
Trainable params: 1,038,497
Non-trainable params: 0
_________________________________________________________________
答案 0 :(得分:2)
您是正确的,这是由于conv1
中的过滤器数量所致。您必须计算的是存储激活所需的内存:
如model.summary()
所示,该层的输出大小为(None, 1751, 480, 1024)
。对于单个图像,总共为1751*480*1024
个像素。由于您的图片可能位于float32
中,因此每个像素需要4个字节来存储。因此,此层的输出需要1751*480*1024*4
字节,仅此层每个图像大约需要3.2 GB。
如果要将过滤器的数量更改为64个,则每个图像仅需要约200 MB。
要么更改过滤器数量,要么将批次大小更改为1。