我正在做一个非常简单的逻辑回归问题,但是它并没有收敛。数据集是线性可分离的。损耗不可能收敛到0。
损耗收敛很慢,似乎收敛到一个常数。梯度也不会收敛到0。我已经检查了计算梯度的功能(通过梯度检查),这是正确的。损失函数也应该是正确的。而且改变学习速度也无济于事。
import random
import numpy as np
import matplotlib.pyplot as plt
def sigmoid(z):
""" Sigmoid function """
s = 1 / (1 + np.exp(-z))
return s
def cost_function(theta, X, y):
output = sigmoid(np.dot(X, theta))
cost = 0
m_samples = len(y)
for i in range(m_samples):
if y[i] == 0:
cost += -(1 - y[i]) * np.log(1 - output[i])
elif y[i] == 1:
cost += -y[i] * np.log(output[i])
cost /= m_samples
return cost
def gradient_update(theta, X, y):
output = sigmoid(np.dot(X, theta))
grad = np.dot((output - y).T, X)
grad = grad / m_samples
return grad
def gradient_descent(theta, X, y, alpha, max_iterations, print_iterations):
m_samples = len(y)
iteration = 0
X_train = X / np.max(X)
while (iteration < max_iterations):
iteration += 1
gradient = gradient_update(theta, X_train, y)
theta = theta - alpha * gradient
if iteration % print_iterations == 0 or iteration == 1:
cost = cost_function(theta, X_train, y)
print("[ Iteration", iteration, "]", "cost =", cost)
#print(gradient)
num_features = train_X.shape[1]
initial_theta = np.random.randn(num_features)
max_iter = 200
print_iter = 25
alpha_test = 0.1
learned_theta = gradient_descent(initial_theta, train_X, train_y, alpha_test, max_iter, print_iter)
我认为它收敛速度不是很慢,而是应该很快收敛。
这是输出。
[迭代1]费用= 0.6321735730663283
[迭代25]费用= 0.6307985058882454
[迭代50]费用= 0.6302278288232466
[迭代75]费用= 0.6300077925064239
[迭代100]费用= 0.6299228901862299
[迭代125]成本= 0.6298894960439918
[迭代150]费用= 0.6298756287152963
[迭代175]费用= 0.6298691634248297
[迭代200]费用= 0.6298655267069331
我不知道发生了什么事。