我正在尝试可视化Attention OCR模型的显着性图。我的方法似乎无法复制本文中看到的结果。以下是我使用的代码。我用过Saliency library by PAIR。
我刚刚修改了Github Repo of attention OCR
附带的demo_inference脚本。这是我采用的方法(类似于PAIR提供的示例笔记本),采用相对于输入图像中每个像素都被激活的特定神经元的logit的梯度。
# Logit Node names
logits = graph.get_tensor_by_name('AttentionOcr_v1/sequence_logit_fn/SQLR/concat:0')
#print(logits)
neuron_selector = tf.placeholder(tf.int32)
character_no = tf.placeholder(tf.int32)
y = logits[0][character_no][neuron_selector]
prediction = graph.get_tensor_by_name('AttentionOcr_v1/ArgMax:0')
#print(prediction)
x = graph.get_tensor_by_name('map/TensorArrayStack/TensorArrayGatherV3:0')
# Iterate over the images
for im_no in range(images_data.shape[0]):
# Make a prediction.
prediction_class, output = sess.run([prediction, endpoints.predicted_text], feed_dict = {images_placeholder: [images_data[im_no]]})
prediction_class = prediction_class[0]
output = output[0].decode()
print("Prediction classes: " + str(prediction_class) +output)
fig = plt.figure()
gradient_saliency = saliency.GradientSaliency(graph, sess, y, x)
for ch_no, char_class in enumerate(prediction_class):
# Use the PAIR library for calculating saliency map
starttime = time()
smoothgrad_mask_3d = gradient_saliency.GetSmoothedMask(images_data[im_no].astype(np.float32), stdev_spread=.03, nsamples=16, feed_dict={neuron_selector: char_class, character_no: ch_no})
smoothgrad_mask_grayscale = saliency.VisualizeImageGrayscale(smoothgrad_mask_3d)
endtime = time()
print(endtime-starttime)
mask = smoothgrad_mask_grayscale
mask *= 255/mask.max() # Normalise the mask to 0-255 for alpha channel
Image of Saliency Map for my use-case -1 Image of Saliency Map for my use-case -2
所看到的输出似乎是嘈杂的,即使在预测正确的情况下,显着性也不会超出所预测图像中的字符。