动量,RMSprop和Adam优化器

时间:2020-07-13 21:20:18

标签: python machine-learning optimization deep-learning hyperparameters

所以,我很难让RMSprop和Adam工作。

我已经正确地将动量实现为优化算法,这意味着,将梯度下降与动量进行比较,使用动量可以更快地降低成本。如果使用Momentum,则在相同的纪元数下,测试集的模型精度也更高。

代码如下:

# only momentum
elif name == 'momentum':
            
    # calculate momentum for every layer
    for i in range(self.number_of_layers - 1):
        self.v[f'dW{i}'] = beta1 * self.v[f'dW{i}'] + (1 - beta1) * self.gradients[f'dW{i}']
        self.v[f'db{i}'] = beta1 * self.v[f'db{i}'] + (1 - beta1) * self.gradients[f'db{i}']
                
    # update parameters
    for i in range(self.number_of_layers - 1):
        self.weights[i] = self.weights[i] - self.learning_rate * self.v[f'dW{i}']
        self.biases[i] = self.biases[i] - self.learning_rate * self.v[f'db{i}']

我已经尽力想实现RMSprop和Adam,但都没有成功。下面的代码。对于为什么它不起作用的任何帮助将不胜感激!

# only rms
elif name == 'rms':
            
    # calculate rmsprop for every layer
    for i in range(self.number_of_layers - 1):
        self.s[f'dW{i}'] = beta2 * self.s[f'dW{i}'] + (1 - beta2) * self.gradients[f'dW{i}']**2 
        self.s[f'db{i}'] = beta2 * self.s[f'db{i}'] + (1 - beta2) * self.gradients[f'db{i}']**2
               
    # update parameters
    for i in range(self.number_of_layers - 1):
        self.weights[i] = self.weights[i] - self.learning_rate * self.gradients[f'dW{i}'] / (np.sqrt(self.s[f'dW{i}']) + epsilon)
        self.biases[i] = self.biases[i] - self.learning_rate * self.gradients[f'db{i}'] / (np.sqrt(self.s[f'db{i}']) + epsilon)
# adam optimizer
elif name == 'adam':
            
    # counter
    # this resets every time an epoch finishes
    self.t += 1
          
    # loop through layers
    for i in range(self.number_of_layers - 1):
                
        # calculate v and s
        self.v[f'dW{i}'] = beta1 * self.v[f'dW{i}'] + (1 - beta1) * self.gradients[f'dW{i}']
        self.v[f'db{i}'] = beta1 * self.v[f'db{i}'] + (1 - beta1) * self.gradients[f'db{i}']
        self.s[f'dW{i}'] = beta2 * self.s[f'dW{i}'] + (1 - beta2) * np.square(self.gradients[f'dW{i}'])
        self.s[f'db{i}'] = beta2 * self.s[f'db{i}'] + (1 - beta2) * np.square(self.gradients[f'db{i}'])
                
        # bias correction
        self.v1[f'dW{i}'] = self.v[f'dW{i}'] / (1 - beta1**self.t)
        self.v1[f'db{i}'] = self.v[f'db{i}'] / (1 - beta1**self.t)
        self.s1[f'dW{i}'] = self.s[f'dW{i}'] / (1 - beta2**self.t)
        self.s1[f'db{i}'] = self.s[f'db{i}'] / (1 - beta2**self.t)
                
    # update parameters
    for i in range(self.number_of_layers - 1):
        self.weights[i] = self.weights[i] - self.learning_rate * np.divide(self.v1[f'dW{i}'], (np.sqrt(self.s1[f'dW{i}']) + epsilon))
        self.biases[i] = self.biases[i] - self.learning_rate * np.divide(self.v1[f'db{i}'], (np.sqrt(self.s1[f'db{i}']) + epsilon))
                
# additional information
# epsilon = 1e-8
# beta1 = 0.9
# beta2 = 0.999

0 个答案:

没有答案