Logistic回归不收敛

时间:2020-04-12 13:28:17

标签: python machine-learning deep-learning logistic-regression

我创建了逻辑回归模型,训练过程基于动量和Adagrad算法支持的随机梯度下降。当我在MNIST数据集上训练模型时,我得到了相当奇怪的结果,例如:

Epoch 1: Accuracy = 0.8808703918876096 Cost = 1.9201423303296545
Epoch 2: Accuracy = 0.8718284567444808 Cost = 2.0658811067405587
Epoch 3: Accuracy = 0.8752086194148093 Cost = 2.011399321166389
Epoch 4: Accuracy = 0.8778704975176931 Cost = 1.9684949150267304
Epoch 5: Accuracy = 0.8881166156121263 Cost = 1.8033470025050278
Epoch 6: Accuracy = 0.8779338755677617 Cost = 1.9674733815472147
Epoch 7: Accuracy = 0.8759691560156333 Cost = 1.9991409194122003
Epoch 8: Accuracy = 0.8892996725467414 Cost = 1.7842783775540683
Epoch 9: Accuracy = 0.8799619731699588 Cost = 1.9347843102027125
Epoch 10: Accuracy = 0.8583078060631668 Cost = 2.2838082490372384

如您所见,我的模型没有收敛,我尝试了各种学习率和动量系数的值,但没有找到解决方案。前面的输出是学习率0.1,动量系数是0.9,因此是最常见的组合。 型号:

class Model(object):
    def __init__(self, learning_rate: float, epochs: int, momentum_coef=0.0):
        self.momentum_coef = momentum_coef
        self.learning_rate = learning_rate
        self.epochs = epochs
        self.epsilon = 1e-7
        self._w = np.zeros(1)
        self._velocity = np.zeros(1)
        self.grad_history_squared = 0

    def fit(self, X: np.ndarray , y: np.ndarray) -> None:
        """ Fitting weights to the data using stochasic gradient descent with momentum and Adagrad"""
        if X.shape[0] != self._w.shape[0]:
            self._w = np.random.random(X.shape[1])
        if X.shape[0] != self._velocity.shape[0]:
            self._velocity = np.zeros(X.shape[1])
        indexes = np.arange(y.shape[0])

        for epoch in range(self.epochs):
            np.random.shuffle(indexes)
            for i in indexes:
                xi, yi = X[i], y[i]
                gradient = (1/X.shape[1]) * (self.predict(xi) - yi)*xi
                self._velocity = self._velocity * self.momentum_coef + (1 - self.momentum_coef) * gradient
                self._w -= (self.learning_rate / np.sqrt(self.grad_history_squared + self.epsilon)) * self._velocity
                self.grad_history_squared += gradient**2
            print("Epoch {}: Accuracy = {} Cost = {}".format(epoch + 1, self.evaluate(self.predict(X), y),
                                                             self.compute_cost(X, y)))

    def predict(self, X: np.ndarray) -> np.ndarray:
        return np.round(self.sigmoid(X))

    def sigmoid(self, X: np.ndarray) -> np.ndarray:
        return 1/(1 + np.exp(-self._w @ X.T + self.epsilon))

    def compute_cost(self, X: np.ndarray, y: np.ndarray) -> float:
        y_pred = self.predict(X)
        return -np.average(y * np.log(y_pred + self.epsilon) + (1 - y) * np.log(1 - y_pred + self.epsilon))

    @staticmethod
    def evaluate(y_true: np.ndarray, y_pred: np.ndarray) -> float:
        """ Evaluating accuracy"""
        return (len(y_true) - np.sum(np.abs(y_pred - y_true))) / len(y_true)

我当然会为学习过程提供按比例缩放的数据。有什么事吗我的实现正确吗?

0 个答案:

没有答案