我不知道千层面功能的操作机制。 对于下面的代码。
class WScaleLayer(lasagne.layers.Layer):
def __init__(self, incoming, **kwargs):
super(WScaleLayer, self).__init__(incoming, **kwargs)
W = incoming.W.get_value()
scale = np.sqrt(np.mean(W ** 2))
incoming.W.set_value(W / scale)
self.scale = self.add_param(scale, (), name='scale', trainable=False)
self.b = None
if hasattr(incoming, 'b') and incoming.b is not None:
b = incoming.b.get_value()
self.b = self.add_param(b, b.shape, name='b', regularizable=False)
del incoming.params[incoming.b]
incoming.b = None
self.nonlinearity = lasagne.nonlinearities.linear
if hasattr(incoming, 'nonlinearity') and incoming.nonlinearity is not None:
self.nonlinearity = incoming.nonlinearity
incoming.nonlinearity = lasagne.nonlinearities.linear
def get_output_for(self, v, **kwargs):
v = v * self.scale
if self.b is not None:
pattern = ['x', 0] + ['x'] * (v.ndim - 2)
v = v + self.b.dimshuffle(*pattern)
return self.nonlinearity(v)
你能告诉我初始化后训练过程中 self.scale 是否保持不变?
答案 0 :(得分:0)
我不是烤宽面条专家,但除非你做了一些奇怪的事情,否则在训练期间不应该改变自我规模。
但这段代码很奇怪。您可以使用传入权重的初始值来初始化比例。这真的是你想要的吗?