caffe python manual sgd

时间:2016-04-06 18:19:35

标签: python caffe pycaffe

我正在尝试实现SGD功能,以便在caffe python中手动更新python中的权重,而不是使用solver.step()函数。目标是在执行solver.step()之后匹配权重更新,并通过手动更新权重来匹配权重更新。

设置如下: 使用MNIST数据。将solver.prototxt中的随机种子设置为:random_seed: 52。确保momentum: 0.0base_lr: 0.01lr_policy: "fixed"。完成以上操作,我可以简单地实现SGD更新方程(没有动量,正则化等)。方程式很简单: W_t + 1 = W_t - mu * W_t_diff

以下是两项测试:

测试1: 使用caffe的forward()和backward()来计算前向传播和后向传播。 对于每个包含权重的图层:

    for k in weight_layer_idx:
        solver.net.layers[k].blobs[0].diff[...] *= lr # weights
        solver.net.layers[k].blobs[1].diff[...] *= lr # biases

接下来,将权重/偏差更新为:

        solver.net.layers[k].blobs[0].data[...] -= solver.net.layers[k].blobs[0].diff
        solver.net.layers[k].blobs[1].data[...] -= solver.net.layers[k].blobs[1].diff

我运行了5次迭代。

测试2 :运行caffe' solver.step(5)

现在,我期望两次测试在两次迭代后产生完全相同的权重。

我在上述每个测试之后保存权重值,并通过两个测试计算权重向量之间的范数差异,我发现它们不是精确的。有人会发现我可能遗漏的东西吗?

以下是整个代码供参考:

import caffe
caffe.set_device(0)
caffe.set_mode_gpu()
import numpy as np

niter = 5
solver = None
solver = caffe.SGDSolver('solver.prototxt')

# Automatic SGD: TEST2
solver.step(niter)
# save the weights to compare later
w_solver_step = copy(solver.net.layers[1].blobs[0].data.astype('float64'))
b_solver_step = copy(solver.net.layers[1].blobs[1].data.astype('float64'))

# Manual SGD: TEST1
solver = None
solver = caffe.SGDSolver('solver.prototxt')
lr = 0.01
momentum = 0.

# Get layer types
layer_types = []
for ll in solver.net.layers:
    layer_types.append(ll.type)

# Get the indices of layers that have weights in them
weight_layer_idx = [idx for idx,l in enumerate(layer_types) if 'Convolution' in l or 'InnerProduct' in l]

for it in range(1, niter+1):
    solver.net.forward()  # fprop
    solver.net.backward()  # bprop
    for k in weight_layer_idx:
        solver.net.layers[k].blobs[0].diff[...] *= lr
        solver.net.layers[k].blobs[1].diff[...] *= lr
        solver.net.layers[k].blobs[0].data[...] -= solver.net.layers[k].blobs[0].diff
        solver.net.layers[k].blobs[1].data[...] -= solver.net.layers[k].blobs[1].diff

# save the weights to compare later
w_fwdbwd_update = copy(solver.net.layers[1].blobs[0].data.astype('float64'))
b_fwdbwd_update = copy(solver.net.layers[1].blobs[1].data.astype('float64'))

# Compare
print "after iter", niter, ": weight diff: ", np.linalg.norm(w_solver_step - w_fwdbwd_update), "and bias diff:", np.linalg.norm(b_solver_step - b_fwdbwd_update)

将权重与两个测试进行比较的最后一行产生:

after iter 5 : weight diff: 0.000203027766144 and bias diff: 1.78390789051e-05 我希望这个差异在哪里为0.0

有什么想法吗?

1 个答案:

答案 0 :(得分:4)

你得到的几乎是正确的,你只需要在每次更新后将差异设置为零。 Caffe不会自动执行此操作以使您有机会实现批量累积(对于一次重量更新,可以在多个批次中累积渐变,如果您的内存不足以满足所需的批量大小,这可能会有所帮助。)

另一个可能的问题可能是使用cudnn,它的卷积实现是非确定性的(或者如何将其设置为在caffe中使用是精确的)。 一般来说,这应该没有问题,但在你的情况下,它每次导致的结果略有不同,因此权重不同。 如果用cudnn编译caffe,你可以简单地将模式设置为cpu,以防止在测试时发生这种情况。

此外,求解器参数会对重量更新产生影响。如你所知,你应该知道:

  • lr_policy:"已修复"
  • 势头:0
  • weight_decay:0
  • random_seed:52#或任何其他常量

在网中,一定不要使用学习率倍增器,通常学习的偏差是权重的两倍,但这不是你实施的行为。因此,您需要确保在图层定义中将它们设置为一个:

param {
    lr_mult: 1 # weight lr multiplier
  }
param {
    lr_mult: 1 # bias lr multiplier
  }

最后但并非最不重要的,这里举例说明您的代码在动量,重量衰减和lr_mult方面的样子。在CPU模式下,这会产生预期的输出(无差异):

import caffe
caffe.set_device(0)
caffe.set_mode_cpu()
import numpy as np

niter = 5
solver = None
solver = caffe.SGDSolver('solver.prototxt')

# Automatic SGD: TEST2
solver.step(niter)
# save the weights to compare later
w_solver_step = solver.net.layers[1].blobs[0].data.copy()
b_solver_step = solver.net.layers[1].blobs[1].data.copy()

# Manual SGD: TEST1
solver = None
solver = caffe.SGDSolver('solver.prototxt')
base_lr = 0.01
momentum = 0.9
weight_decay = 0.0005
lr_w_mult = 1
lr_b_mult = 2

momentum_hist = {}
for layer in solver.net.params:
    m_w = np.zeros_like(solver.net.params[layer][0].data)
    m_b = np.zeros_like(solver.net.params[layer][1].data)
    momentum_hist[layer] = [m_w, m_b]

for i in range(niter):
    solver.net.forward()
    solver.net.backward()
    for layer in solver.net.params:
        momentum_hist[layer][0] = momentum_hist[layer][0] * momentum + (solver.net.params[layer][0].diff + weight_decay *
                                                       solver.net.params[layer][0].data) * base_lr * lr_w_mult
        momentum_hist[layer][1] = momentum_hist[layer][1] * momentum + (solver.net.params[layer][1].diff + weight_decay *
                                                       solver.net.params[layer][1].data) * base_lr * lr_b_mult
        solver.net.params[layer][0].data[...] -= momentum_hist[layer][0]
        solver.net.params[layer][1].data[...] -= momentum_hist[layer][1]
        solver.net.params[layer][0].diff[...] *= 0
        solver.net.params[layer][1].diff[...] *= 0

# save the weights to compare later
w_fwdbwd_update = solver.net.layers[1].blobs[0].data.copy()
b_fwdbwd_update = solver.net.layers[1].blobs[1].data.copy()

# Compare
print "after iter", niter, ": weight diff: ", np.linalg.norm(w_solver_step - w_fwdbwd_update), "and bias diff:", np.linalg.norm(b_solver_step - b_fwdbwd_update)