Question

我对如何在模型中使用/插入"BatchNorm"图层感到有些困惑我看到了几种不同的方法，例如：

ResNets：`"BatchNorm"` + `"Scale"`（无参数共享）

"BatchNorm"图层会立即跟随"Scale"图层：

layer {
    bottom: "res2a_branch1"
    top: "res2a_branch1"
    name: "bn2a_branch1"
    type: "BatchNorm"
    batch_norm_param {
        use_global_stats: true
    }
}

layer {
    bottom: "res2a_branch1"
    top: "res2a_branch1"
    name: "scale2a_branch1"
    type: "Scale"
    scale_param {
        bias_term: true
    }
}

cifar10 example：仅`"BatchNorm"`

在提供caffe的cifar10示例中，使用"BatchNorm"时没有任何"Scale"：

layer {
  name: "bn1"
  type: "BatchNorm"
  bottom: "pool1"
  top: "bn1"
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
}

cifar10 `batch_norm_param`和`TRAIN`

不同TEST

batch_norm_param: use_global_scale在TRAIN和TEST阶段之间进行了更改：

layer {
  name: "bn1"
  type: "BatchNorm"
  bottom: "pool1"
  top: "bn1"
  batch_norm_param {
    use_global_stats: false
  }
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
  include {
    phase: TRAIN
  }
}
layer {
  name: "bn1"
  type: "BatchNorm"
  bottom: "pool1"
  top: "bn1"
  batch_norm_param {
    use_global_stats: true
  }
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
  include {
    phase: TEST
  }
}

那应该是什么？

如何在caffe中使用"BatchNorm"图层？

Answer 1

如果您按照原始纸张进行批量标准化，则应遵循“缩放”和“偏移”图层（可以通过“缩放”包含偏差，但这会使“偏差”参数无法访问）。 use_global_stats也应该从训练（False）更改为测试/部署（True） - 这是默认行为。请注意，您提供的第一个示例是用于部署的原型文本，因此将其设置为True是正确的。

我不确定共享参数。

我提出了一个拉取请求来改进批量规范化的文档，但之后关闭了它，因为我想修改它。然后，我再也没有回过头来。

请注意，我认为lr_mult: 0 "BatchNorm"不再需要（可能不被允许？），虽然我现在没有找到相应的PR。

Answer 2

在每个BatchNorm之后，我们必须在Caffe中添加一个Scale层。原因是Caffe BatchNorm层仅从输入数据中减去平均值并除以它们的方差，而没有包含分别缩放和移动归一化分布1的γ和β参数。相反，Keras BatchNormalization层包括并应用上面提到的所有参数。在Caffe中使用将“ bias_term”参数设置为True的Scale图层，可以安全地重现Keras版本的确切行为。 https://www.deepvisionconsulting.com/from-keras-to-caffe/

应该如何＆＃34; BatchNorm＆＃34;层用于咖啡？

ResNets：`"BatchNorm"` + `"Scale"`（无参数共享）

cifar10 example：仅`"BatchNorm"`

cifar10 `batch_norm_param`和`TRAIN`

那应该是什么？

2 个答案:

应该如何＆＃34; BatchNorm＆＃34;层用于咖啡？

ResNets："BatchNorm" + "Scale"（无参数共享）

cifar10 example：仅"BatchNorm"

cifar10 batch_norm_param和TRAIN

那应该是什么？

2 个答案:

ResNets：`"BatchNorm"` + `"Scale"`（无参数共享）

cifar10 example：仅`"BatchNorm"`

cifar10 `batch_norm_param`和`TRAIN`