Question

我正在做一个非常简单的逻辑回归问题，但是它并没有收敛。数据集是线性可分离的。损耗不可能收敛到0。

损耗收敛很慢，似乎收敛到一个常数。梯度也不会收敛到0。我已经检查了计算梯度的功能（通过梯度检查），这是正确的。损失函数也应该是正确的。而且改变学习速度也无济于事。

import random
import numpy as np
import matplotlib.pyplot as plt

def sigmoid(z):
    """ Sigmoid function """
    s = 1 / (1 + np.exp(-z))
    return s


def cost_function(theta, X, y):

    output = sigmoid(np.dot(X, theta))
    cost = 0
    m_samples = len(y)
    for i in range(m_samples):
        if y[i] == 0:
            cost += -(1 - y[i]) * np.log(1 - output[i])
        elif y[i] == 1:
            cost += -y[i] * np.log(output[i])
    cost /= m_samples
    return cost


def gradient_update(theta, X, y):
    output = sigmoid(np.dot(X, theta))
    grad = np.dot((output - y).T, X)
    grad = grad / m_samples
    return grad


def gradient_descent(theta, X, y, alpha, max_iterations, print_iterations):

    m_samples = len(y)
    iteration = 0
    X_train = X / np.max(X)

    while (iteration < max_iterations):
        iteration += 1


        gradient = gradient_update(theta, X_train, y)
        theta = theta - alpha * gradient
        if iteration % print_iterations == 0 or iteration == 1:
            cost = cost_function(theta, X_train, y)
            print("[ Iteration", iteration, "]", "cost =", cost)
        #print(gradient)

num_features = train_X.shape[1]
initial_theta = np.random.randn(num_features)
max_iter = 200
print_iter = 25
alpha_test = 0.1

learned_theta = gradient_descent(initial_theta, train_X, train_y, alpha_test, max_iter, print_iter)

我认为它收敛速度不是很慢，而是应该很快收敛。

这是输出。

[迭代1]费用= 0.6321735730663283

[迭代25]费用= 0.6307985058882454

[迭代50]费用= 0.6302278288232466

[迭代75]费用= 0.6300077925064239

[迭代100]费用= 0.6299228901862299

[迭代125]成本= 0.6298894960439918

[迭代150]费用= 0.6298756287152963

[迭代175]费用= 0.6298691634248297

[迭代200]费用= 0.6298655267069331

我不知道发生了什么事。

Logistic回归损失不会收敛到0

0 个答案: