Question

以下是以VGG16为基础模型的微调网络体系结构。

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
model_1 (Model)              (None, 25088)             14714688  
_________________________________________________________________
dense_1 (Dense)              (None, 512)               12845568  
_________________________________________________________________
dropout_1 (Dropout)          (None, 512)               0         
_________________________________________________________________
dense_2 (Dense)              (None, 512)               262656    
_________________________________________________________________
dropout_2 (Dropout)          (None, 512)               0         
_________________________________________________________________
dense_3 (Dense)              (None, 1)                 513       
=================================================================
Total params: 27,823,425
Trainable params: 26,087,937
Non-trainable params: 1,735,488
_________________________________________________________________

我正在尝试可视化输入相对于损失的梯度以及输出的“ block5_conv3”。使用

def build_backprop(model, loss):
    # Gradient of the input image with respect to the loss function
    gradients = K.gradients(loss, model.input)[0]
    # Normalize the gradients
    gradients /= (K.sqrt(K.mean(K.square(gradients))) + 1e-5)
    # Keras function to calculate the gradients and loss
    return K.function([model.input], [loss, gradients])

# Input wrt to loss
# Loss function that optimizes one class
loss_function = K.mean(model.get_layer('dense_3').output)
# Backprop function
backprop = build_backprop(model.get_layer('model_1').get_layer('input_1'), loss_function)

# block5_conv3 wrt to output
K.gradients(model.get_layer("dense_3").output, model.get_layer("model_1").get_layer("block5_conv3").output)[0])

以上两项均返回AttributeError: 'NoneType' object has no attribute 'dtype'，这意味着在两种情况下 K.gradients输出均为None 。

什么都可能导致无梯度的结果？
有什么办法可以解决这种错误？

更新

只有将顺序API转换为功能性API才能解决无问题。

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_2 (InputLayer)         (None, 224, 224, 3)       0         
_________________________________________________________________
block1_conv1 (Conv2D)        (None, 224, 224, 64)      1792      
_________________________________________________________________
block1_conv2 (Conv2D)        (None, 224, 224, 64)      36928     
_________________________________________________________________
block1_pool (MaxPooling2D)   (None, 112, 112, 64)      0         
_________________________________________________________________
block2_conv1 (Conv2D)        (None, 112, 112, 128)     73856     
_________________________________________________________________
block2_conv2 (Conv2D)        (None, 112, 112, 128)     147584    
_________________________________________________________________
block2_pool (MaxPooling2D)   (None, 56, 56, 128)       0         
_________________________________________________________________
block3_conv1 (Conv2D)        (None, 56, 56, 256)       295168    
_________________________________________________________________
block3_conv2 (Conv2D)        (None, 56, 56, 256)       590080    
_________________________________________________________________
block3_conv3 (Conv2D)        (None, 56, 56, 256)       590080    
_________________________________________________________________
block3_pool (MaxPooling2D)   (None, 28, 28, 256)       0         
_________________________________________________________________
block4_conv1 (Conv2D)        (None, 28, 28, 512)       1180160   
_________________________________________________________________
block4_conv2 (Conv2D)        (None, 28, 28, 512)       2359808   
_________________________________________________________________
block4_conv3 (Conv2D)        (None, 28, 28, 512)       2359808   
_________________________________________________________________
block4_pool (MaxPooling2D)   (None, 14, 14, 512)       0         
_________________________________________________________________
block5_conv1 (Conv2D)        (None, 14, 14, 512)       2359808   
_________________________________________________________________
block5_conv2 (Conv2D)        (None, 14, 14, 512)       2359808   
_________________________________________________________________
block5_conv3 (Conv2D)        (None, 14, 14, 512)       2359808   
_________________________________________________________________
block5_pool (MaxPooling2D)   (None, 7, 7, 512)         0         
_________________________________________________________________
flatten_2 (Flatten)          (None, 25088)             0         
_________________________________________________________________
dense_10 (Dense)             (None, 512)               12845568  
_________________________________________________________________
dropout_7 (Dropout)          (None, 512)               0         
_________________________________________________________________
dense_11 (Dense)             (None, 512)               262656    
_________________________________________________________________
dropout_8 (Dropout)          (None, 512)               0         
_________________________________________________________________
dense_12 (Dense)             (None, 2)                 1026      
=================================================================
Total params: 27,823,938
Trainable params: 20,188,674
Non-trainable params: 7,635,264
_________________________________________________________________

变更后的新架构。现在的错误是所有梯度都变为0s。

例如

preds = model.predict(x)
class_idx = np.argmax(preds[0])
class_output = model.output[:, class_idx]
last_conv_layer = model.get_layer("block5_conv3")
grads = K.gradients(class_output, last_conv_layer.output)[0]
pooled_grads = K.mean(grads, axis=(0, 1, 2))

iterate = K.function([model.input], [pooled_grads, last_conv_layer.output[0]])

pooled_grads_value, conv_layer_output_value = iterate([x])

for i in range(512):
    conv_layer_output_value[:, :, i] *= pooled_grads_value[i]

pooled_grads_value和conv_layer_output_value的输出均为零。

Answer 1

我能够解决两个问题。

问题1：以上两种都返回AttributeError：'NoneType'对象没有属性'dtype'，这意味着在两种情况下K.gradients输出均为None。

这里的问题是模型是顺序模型，但是从顺序模型转换为功能模型后，这个问题消失了，并且出现了新的问题。

问题2：pooled_grads_value和conv_layer_output_value的输出均为零。

我通过converting last softmax layer to linear layer解决了这个问题。
这是代码

from vis.utils import utils
from keras import activations

# Utility to search for layer index by name. 
# Alternatively we can specify this as -1 since it corresponds to the last layer.
layer_idx = utils.find_layer_idx(model, 'dense_12')

# Swap softmax with linear
model.layers[layer_idx].activation = activations.linear
model = utils.apply_modifications(model)

此交换工作正常，我获得了预期的结果。

虽然，现在唯一的部分是我不明白为什么它不适用于softmax？如果我们将最后一层从softmax替换为1个输出的S型，是否可以？

使用K.gradients

1 个答案: