Question

我正在尝试重新训练（读取微调）MobileNet图像分类器。

由tensorflow here（来自tutorial）给出的重新训练脚本仅更新新添加的完全连接层的权重。我修改了这个脚本来更新预训练模型的所有层的权重。我正在使用深度乘数为0.25且输入大小为128的MobileNet架构。

然而，在重新训练时，我发现了一件奇怪的事情，如果我将一个特定的图像作为输入用于与一些其他图像批量推断，则某些图层之后的激活值与图像单独传递时的激活值不同。此外，来自不同批次的相同图像的激活值也不同。示例 - 两批 - batch_1 : [img1, img2, img3]; batch_2 : [img1, img4, img5]。 img1的激活与批次不同。

以下是我用于推理的代码 -

for tf.Session(graph=tf.get_default_graph()) as sess:
    image_path = '/tmp/images/10dsf00003.jpg'
    id_ = gfile.FastGFile(image_path, 'rb').read()

    #The line below loads the jpeg using tf.decode_jpeg and does some preprocessing
    id = sess.run(decoded_image_tensor, {jpeg_data_tensor: id_})

    input_image_tensor = graph.get_tensor_by_name('input')

    layerXname='MobilenetV1/MobilenetV1/Conv2d_1_depthwise/Relu:0' #Name of the layer whose activations to inspect.
    layerX = graph.get_tensor_by_name(layerXname)
    layerXactivations=sess.run(layerX, {input_image_tensor: id})

上面的代码按原样执行一次，最后一行执行以下更改：

layerXactivations_batch=sess.run(layerX, {input_image_tensor: np.asarray([np.squeeze(id), np.squeeze(id), np.squeeze(id)])})

以下是图表中的一些节点：

[u'input',  u'MobilenetV1/Conv2d_0/weights',  u'MobilenetV1/Conv2d_0/weights/read',  u'MobilenetV1/MobilenetV1/Conv2d_0/convolution',  u'MobilenetV1/Conv2d_0/BatchNorm/beta',  u'MobilenetV1/Conv2d_0/BatchNorm/beta/read',  u'MobilenetV1/Conv2d_0/BatchNorm/gamma',  u'MobilenetV1/Conv2d_0/BatchNorm/gamma/read',  u'MobilenetV1/Conv2d_0/BatchNorm/moving_mean',  u'MobilenetV1/Conv2d_0/BatchNorm/moving_mean/read',  u'MobilenetV1/Conv2d_0/BatchNorm/moving_variance',  u'MobilenetV1/Conv2d_0/BatchNorm/moving_variance/read',  u'MobilenetV1/MobilenetV1/Conv2d_0/BatchNorm/batchnorm/add/y',  u'MobilenetV1/MobilenetV1/Conv2d_0/BatchNorm/batchnorm/add',  u'MobilenetV1/MobilenetV1/Conv2d_0/BatchNorm/batchnorm/Rsqrt',  u'MobilenetV1/MobilenetV1/Conv2d_0/BatchNorm/batchnorm/mul',  u'MobilenetV1/MobilenetV1/Conv2d_0/BatchNorm/batchnorm/mul_1',  u'MobilenetV1/MobilenetV1/Conv2d_0/BatchNorm/batchnorm/mul_2',  u'MobilenetV1/MobilenetV1/Conv2d_0/BatchNorm/batchnorm/sub',  u'MobilenetV1/MobilenetV1/Conv2d_0/BatchNorm/batchnorm/add_1',  u'MobilenetV1/MobilenetV1/Conv2d_0/Relu6',  u'MobilenetV1/Conv2d_1_depthwise/depthwise_weights',  u'MobilenetV1/Conv2d_1_depthwise/depthwise_weights/read',   ...  ...]

现在layerXname = 'MobilenetV1/MobilenetV1/Conv2d_0/convolution' 在上述两种情况下，激活都是相同的。（即 layerxactivations和layerxactivations_batch [0]相同）。但在此图层之后，所有图层都具有不同的激活值。我觉得'MobilenetV1 / MobilenetV1 / Conv2d_0 / convolution'层之后的batchNorm操作对批量输入和单个图像的行为有所不同。或者问题是由别的东西引起的？

任何帮助/指示都将不胜感激。

Answer 1

构建mobilenet时，有一个名为is_training的参数。如果您未将其设置为false，则dropout图层和批处理规范化图层将在不同的迭代中为您提供不同的结果。批量标准化可能会改变很少的值，但是当丢弃一些输入值时，丢失会改变它们。

看看mobilnet的签名：

def mobilenet_v1(inputs,
                 num_classes=1000,
                 dropout_keep_prob=0.999,
                 is_training=True,
                 min_depth=8,
                 depth_multiplier=1.0,
                 conv_defs=None,
                 prediction_fn=tf.contrib.layers.softmax,
                 spatial_squeeze=True,
                 reuse=None,
                 scope='MobilenetV1'):
  """Mobilenet v1 model for classification.

  Args:
    inputs: a tensor of shape [batch_size, height, width, channels].
    num_classes: number of predicted classes.
    dropout_keep_prob: the percentage of activation values that are retained.
    is_training: whether is training or not.
    min_depth: Minimum depth value (number of channels) for all convolution ops.
      Enforced when depth_multiplier < 1, and not an active constraint when
      depth_multiplier >= 1.
    depth_multiplier: Float multiplier for the depth (number of channels)
      for all convolution ops. The value must be greater than zero. Typical
      usage will be to set this value in (0, 1) to reduce the number of
      parameters or computation cost of the model.
    conv_defs: A list of ConvDef namedtuples specifying the net architecture.
    prediction_fn: a function to get predictions out of logits.
    spatial_squeeze: if True, logits is of shape is [B, C], if false logits is
        of shape [B, 1, 1, C], where B is batch_size and C is number of classes.
    reuse: whether or not the network and its variables should be reused. To be
      able to reuse 'scope' must be given.
    scope: Optional variable_scope.

  Returns:
    logits: the pre-softmax activations, a tensor of size
      [batch_size, num_classes]
    end_points: a dictionary from components of the network to the corresponding
      activation.

  Raises:
    ValueError: Input rank is invalid.
  """

Answer 2

这是由批量标准化引起的。

你是如何进行推理的？您是从检查点文件加载它还是使用Frozen Protobuf模型？如果使用冻结模型，则可以预期不同格式的输入会产生类似的结果。

检查this。这里提出了针对不同应用程序的类似问题。

Tensorflow：同一图像的不同激活值

2 个答案: