Question

我正在尝试从对象检测API重新训练现有的预训练网络。它是ssd_mobilenet_v2。在COCO数据集上进行了预训练。我根据固定在obj-detection-API上的教程重现了步骤。

无论如何，模型都会开始训练，但是％mAP较低。我对CNN完全陌生，因此可以提供任何帮助。

当我开始训练时，就会出现此警告，并且我找不到解决方法。

我正在使用此命令在Google合作笔记本中运行

# Training
!python object_detection/model_main.py \
--pipeline_config_path=${PIPELINE_CONFIG_PATH} \
--model_dir=${MODEL_DIR} \
--num_train_steps=${NUM_TRAIN_STEPS} \
--sample_1_of_n_eval_examples=$SAMPLE_1_OF_N_EVAL_EXAMPLES \
--alsologtostderrps

这是我得到的警告：

WARNING:root:Variable [FeatureExtractor/MobilenetV2/layer_19_2_Conv2d_2_3x3_s2_512/weights] is     available in checkpoint, but has an incompatible shape with model variable. Checkpoint shape: [[1, 1, 256, 512]], model variable shape: [[3, 3, 256, 512]]. This variable will not be initialized from the checkpoint.
WARNING:root:Variable [FeatureExtractor/MobilenetV2/layer_19_2_Conv2d_3_3x3_s2_256/weights] is available in checkpoint, but has an incompatible shape with model variable. Checkpoint shape: [[1, 1, 128, 256]], model variable shape: [[3, 3, 128, 256]]. This variable will not be initialized from the checkpoint.
WARNING:root:Variable [FeatureExtractor/MobilenetV2/layer_19_2_Conv2d_4_3x3_s2_256/weights] is available in checkpoint, but has an incompatible shape with model variable. Checkpoint shape: [[1, 1, 128, 256]], model variable shape: [[3, 3, 128, 256]]. This variable will not be initialized from the checkpoint.
WARNING:root:Variable [FeatureExtractor/MobilenetV2/layer_19_2_Conv2d_5_3x3_s2_128/weights] is available in checkpoint, but has an incompatible shape with model variable. Checkpoint shape: [[1, 1, 64, 128]], model variable shape: [[3, 3, 64, 128]]. This variable will not be initialized from the checkpoint.

运行10分钟后，它会打印出以下内容：

Accumulating evaluation results...
DONE (t=1.73s).
Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.000
Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.002
Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.000
Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000
Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.000
Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.000
Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.000
Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.006
Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.040
Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.002
Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.026
Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.050

我没有更改* .ckpt文件，只是下载了ssd_mobilenet_v2_coco_2018_03_29的原始预培训版本，并使用了它们并将其链接到.config文件中。

我试图解决这一问题超过一天。谢谢您的帮助。

Answer 1

您的错误消息说（采取第一行，它们都是相似的）：

layer_19_2_Conv2d_2_3x3_s2_512 / weights在检查点中可用，但形状与模型变量不兼容。 检查点形状：[[1、3、256、512]]，模型变量形状：[[3、3、256、512]] 。

根据this question & answer解释，检查点中的形状是 1x1 卷积的形状（形状开始处的1,1）。模型中的形状正确地是3x3卷积之一。现在，这很奇怪，因为检查点中的层名称具有“ 3x3”，尽管考虑到权重形状，这将是错误的。

那么，看来，您正在使用一个检查点，该检查点对遇到问题的层使用了1x1卷积，尽管这些层的名称暗示着是3x3卷积。作为使用检查点的一种解决方法，您可以尝试修改模型，以修改构建该检查点的函数以使用1x1卷积代替（尽管我不能确定那是什么）。

由于％mAP较低，这当然是由于模型的一部分已重新初始化且未正确加载。

Answer 2

我最近遇到了与Miroslav相同的问题（完全相同的4条警告消息）。尽管@GPhilo是正确的，但此警告消息表示检查点与模型不匹配，但似乎在生成此特定的预先训练的检查点时存在问题。具体来说，ssd_mobilenet_v2_coco_2018_03_29.tar.gz检查点似乎是使用配置文件的预发行版本生成的。这是GitHub上相关问题的链接： https://github.com/tensorflow/models/issues/5315

最后，我从tensorflow / models git repo中的ssd_mobilenet_v2_coco.config文件切换到预训练检查点随附的pipeline.config文件。除了需要更改的常规设置之外，您还需要删除batch_norm_trainable标志。有关此错误的更多信息，在这里： https://github.com/tensorflow/models/issues/4066

注意：我的第一个尝试是切换到MobileNet V2 SSD的量化版本，但是在用数据集重新训练模型后，我并没有达到我希望的精度（不确定原因）。

>

如何解决“变量在检查点中可用，但形状与模型变量不兼容”？

2 个答案: