所以,我很难让RMSprop和Adam工作。
我已经正确地将动量实现为优化算法,这意味着,将梯度下降与动量进行比较,使用动量可以更快地降低成本。如果使用Momentum,则在相同的纪元数下,测试集的模型精度也更高。
代码如下:
# only momentum
elif name == 'momentum':
# calculate momentum for every layer
for i in range(self.number_of_layers - 1):
self.v[f'dW{i}'] = beta1 * self.v[f'dW{i}'] + (1 - beta1) * self.gradients[f'dW{i}']
self.v[f'db{i}'] = beta1 * self.v[f'db{i}'] + (1 - beta1) * self.gradients[f'db{i}']
# update parameters
for i in range(self.number_of_layers - 1):
self.weights[i] = self.weights[i] - self.learning_rate * self.v[f'dW{i}']
self.biases[i] = self.biases[i] - self.learning_rate * self.v[f'db{i}']
我已经尽力想实现RMSprop和Adam,但都没有成功。下面的代码。对于为什么它不起作用的任何帮助将不胜感激!
# only rms
elif name == 'rms':
# calculate rmsprop for every layer
for i in range(self.number_of_layers - 1):
self.s[f'dW{i}'] = beta2 * self.s[f'dW{i}'] + (1 - beta2) * self.gradients[f'dW{i}']**2
self.s[f'db{i}'] = beta2 * self.s[f'db{i}'] + (1 - beta2) * self.gradients[f'db{i}']**2
# update parameters
for i in range(self.number_of_layers - 1):
self.weights[i] = self.weights[i] - self.learning_rate * self.gradients[f'dW{i}'] / (np.sqrt(self.s[f'dW{i}']) + epsilon)
self.biases[i] = self.biases[i] - self.learning_rate * self.gradients[f'db{i}'] / (np.sqrt(self.s[f'db{i}']) + epsilon)
# adam optimizer
elif name == 'adam':
# counter
# this resets every time an epoch finishes
self.t += 1
# loop through layers
for i in range(self.number_of_layers - 1):
# calculate v and s
self.v[f'dW{i}'] = beta1 * self.v[f'dW{i}'] + (1 - beta1) * self.gradients[f'dW{i}']
self.v[f'db{i}'] = beta1 * self.v[f'db{i}'] + (1 - beta1) * self.gradients[f'db{i}']
self.s[f'dW{i}'] = beta2 * self.s[f'dW{i}'] + (1 - beta2) * np.square(self.gradients[f'dW{i}'])
self.s[f'db{i}'] = beta2 * self.s[f'db{i}'] + (1 - beta2) * np.square(self.gradients[f'db{i}'])
# bias correction
self.v1[f'dW{i}'] = self.v[f'dW{i}'] / (1 - beta1**self.t)
self.v1[f'db{i}'] = self.v[f'db{i}'] / (1 - beta1**self.t)
self.s1[f'dW{i}'] = self.s[f'dW{i}'] / (1 - beta2**self.t)
self.s1[f'db{i}'] = self.s[f'db{i}'] / (1 - beta2**self.t)
# update parameters
for i in range(self.number_of_layers - 1):
self.weights[i] = self.weights[i] - self.learning_rate * np.divide(self.v1[f'dW{i}'], (np.sqrt(self.s1[f'dW{i}']) + epsilon))
self.biases[i] = self.biases[i] - self.learning_rate * np.divide(self.v1[f'db{i}'], (np.sqrt(self.s1[f'db{i}']) + epsilon))
# additional information
# epsilon = 1e-8
# beta1 = 0.9
# beta2 = 0.999