以下是以VGG16为基础模型的微调网络体系结构。
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
model_1 (Model) (None, 25088) 14714688
_________________________________________________________________
dense_1 (Dense) (None, 512) 12845568
_________________________________________________________________
dropout_1 (Dropout) (None, 512) 0
_________________________________________________________________
dense_2 (Dense) (None, 512) 262656
_________________________________________________________________
dropout_2 (Dropout) (None, 512) 0
_________________________________________________________________
dense_3 (Dense) (None, 1) 513
=================================================================
Total params: 27,823,425
Trainable params: 26,087,937
Non-trainable params: 1,735,488
_________________________________________________________________
我正在尝试可视化输入相对于损失的梯度以及输出的“ block5_conv3”。使用
def build_backprop(model, loss):
# Gradient of the input image with respect to the loss function
gradients = K.gradients(loss, model.input)[0]
# Normalize the gradients
gradients /= (K.sqrt(K.mean(K.square(gradients))) + 1e-5)
# Keras function to calculate the gradients and loss
return K.function([model.input], [loss, gradients])
# Input wrt to loss
# Loss function that optimizes one class
loss_function = K.mean(model.get_layer('dense_3').output)
# Backprop function
backprop = build_backprop(model.get_layer('model_1').get_layer('input_1'), loss_function)
# block5_conv3 wrt to output
K.gradients(model.get_layer("dense_3").output, model.get_layer("model_1").get_layer("block5_conv3").output)[0])
以上两项均返回AttributeError: 'NoneType' object has no attribute 'dtype'
,这意味着在两种情况下 K.gradients输出均为None 。
什么都可能导致无梯度的结果?
有什么办法可以解决这种错误?
更新
只有将顺序API转换为功能性API才能解决无问题。
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_2 (InputLayer) (None, 224, 224, 3) 0
_________________________________________________________________
block1_conv1 (Conv2D) (None, 224, 224, 64) 1792
_________________________________________________________________
block1_conv2 (Conv2D) (None, 224, 224, 64) 36928
_________________________________________________________________
block1_pool (MaxPooling2D) (None, 112, 112, 64) 0
_________________________________________________________________
block2_conv1 (Conv2D) (None, 112, 112, 128) 73856
_________________________________________________________________
block2_conv2 (Conv2D) (None, 112, 112, 128) 147584
_________________________________________________________________
block2_pool (MaxPooling2D) (None, 56, 56, 128) 0
_________________________________________________________________
block3_conv1 (Conv2D) (None, 56, 56, 256) 295168
_________________________________________________________________
block3_conv2 (Conv2D) (None, 56, 56, 256) 590080
_________________________________________________________________
block3_conv3 (Conv2D) (None, 56, 56, 256) 590080
_________________________________________________________________
block3_pool (MaxPooling2D) (None, 28, 28, 256) 0
_________________________________________________________________
block4_conv1 (Conv2D) (None, 28, 28, 512) 1180160
_________________________________________________________________
block4_conv2 (Conv2D) (None, 28, 28, 512) 2359808
_________________________________________________________________
block4_conv3 (Conv2D) (None, 28, 28, 512) 2359808
_________________________________________________________________
block4_pool (MaxPooling2D) (None, 14, 14, 512) 0
_________________________________________________________________
block5_conv1 (Conv2D) (None, 14, 14, 512) 2359808
_________________________________________________________________
block5_conv2 (Conv2D) (None, 14, 14, 512) 2359808
_________________________________________________________________
block5_conv3 (Conv2D) (None, 14, 14, 512) 2359808
_________________________________________________________________
block5_pool (MaxPooling2D) (None, 7, 7, 512) 0
_________________________________________________________________
flatten_2 (Flatten) (None, 25088) 0
_________________________________________________________________
dense_10 (Dense) (None, 512) 12845568
_________________________________________________________________
dropout_7 (Dropout) (None, 512) 0
_________________________________________________________________
dense_11 (Dense) (None, 512) 262656
_________________________________________________________________
dropout_8 (Dropout) (None, 512) 0
_________________________________________________________________
dense_12 (Dense) (None, 2) 1026
=================================================================
Total params: 27,823,938
Trainable params: 20,188,674
Non-trainable params: 7,635,264
_________________________________________________________________
变更后的新架构。现在的错误是所有梯度都变为0s。
例如
preds = model.predict(x)
class_idx = np.argmax(preds[0])
class_output = model.output[:, class_idx]
last_conv_layer = model.get_layer("block5_conv3")
grads = K.gradients(class_output, last_conv_layer.output)[0]
pooled_grads = K.mean(grads, axis=(0, 1, 2))
iterate = K.function([model.input], [pooled_grads, last_conv_layer.output[0]])
pooled_grads_value, conv_layer_output_value = iterate([x])
for i in range(512):
conv_layer_output_value[:, :, i] *= pooled_grads_value[i]
pooled_grads_value和conv_layer_output_value的输出均为零。
答案 0 :(得分:0)
我能够解决两个问题。
问题1:以上两种都返回AttributeError:'NoneType'对象没有属性'dtype',这意味着在两种情况下K.gradients输出均为None。
这里的问题是模型是顺序模型,但是从顺序模型转换为功能模型后,这个问题消失了,并且出现了新的问题。
问题2:pooled_grads_value和conv_layer_output_value的输出均为零。
我通过converting last softmax layer to linear layer
解决了这个问题。
这是代码
from vis.utils import utils
from keras import activations
# Utility to search for layer index by name.
# Alternatively we can specify this as -1 since it corresponds to the last layer.
layer_idx = utils.find_layer_idx(model, 'dense_12')
# Swap softmax with linear
model.layers[layer_idx].activation = activations.linear
model = utils.apply_modifications(model)
此交换工作正常,我获得了预期的结果。
虽然,现在唯一的部分是我不明白为什么它不适用于softmax?如果我们将最后一层从softmax替换为1个输出的S型,是否可以?