我正在尝试通过正则化实现Logistic回归模型。我陷入了梯度计算的困境,因为当我运行梯度下降算法时,它实际上表明成本函数在增加而不是减少。
def sigmoid(z):
return 1 / (1 + np.exp(-z))
def Probability(theta, X):
return sigmoid(np.dot(X,theta))
def cost_function_regression(theta, x, y, Lambda):
# Computes the cost function for all the training samples
m = x.shape[0]
total_cost = (-(1 / m) * np.sum(
np.dot(y.T, np.log(Probability( theta,x))) + np.dot((1 - y).T, np.log(
1 - Probability(theta,x))))) + (Lambda/ 2)* np.sum(np.dot(theta, theta.T))
return total_cost
def Gradient_regression( theta, X,y, Lambda ):
m=X.shape[0]
grad=(((1/m)* np.dot(X.T, Probability(theta,X)-y)) + np.sum((Lambda/m )* theta))
return(grad)
答案 0 :(得分:0)
我们将首先建立理论,然后是工作示例,最后给出一些评论。
使用梯度体面方法拟合/训练逻辑回归模型(与任何监督的ML模型一样)的步骤如下
h(X)
]识别假设函数[w,b
] J(w,b)
] y_hat = h(X)
] y
]和预测标签[y_hat
]之间的误差。向后传播:使用更新规则
根据误差(通过计算梯度)调整假设函数中的参数。如果渐变高则转到第3步,否则结束
用于逻辑回归的假设函数:
其中X
是向量,X^i
是向量的第i个元素。
用于逻辑回归的常用损失函数是对数损失。 l2正则化的对数损失为:
让我们计算梯度
类似地
现在我们知道梯度了,让我们编写梯度体面算法以适合我们的逻辑回归模型的参数
# load data
iris = datasets.load_iris()
# Lets take only two classes
y = iris.target
X = iris.data[y != 2]
y = y[y != 2]
# Normalize data to 0 mean and 1 std
X[:, 0] = (X[:, 0] - np.mean(X[:, 0]))/np.std(X[:, 0])
X[:, 1] = (X[:, 1] - np.mean(X[:, 1]))/np.std(X[:, 1])
X[:, 2] = (X[:, 2] - np.mean(X[:, 2]))/np.std(X[:, 2])
X[:, 3] = (X[:, 3] - np.mean(X[:, 3]))/np.std(X[:, 3])
def sigmoid(x):
return 1 / (1+math.exp(-x))
# initialize weights
w0, w1, w2, w3, b = 0.01,0.01,0.01,0.01,0.01
n = len(X)
# Learning rate
alpha = 0.01
# The gardient decent loop
while True:
y_hat = [sigmoid(w0*x[0] + w1*x[1] + w2*x[2] + w3*x[3] + b) for x in X]
delta_w0 = -np.sum([(y[j] - y_hat[j])*X[j,0] for j in range(n)])/n + 2*w0
delta_w1 = -np.sum([(y[j] - y_hat[j])*X[j,1] for j in range(n)])/n + 2*w1
delta_w2 = -np.sum([(y[j] - y_hat[j])*X[j,2] for j in range(n)])/n + 2*w2
delta_w3 = -np.sum([(y[j] - y_hat[j])*X[j,3] for j in range(n)])/n + 2*w3
delta_b = -np.sum([(y[j] - y_hat[j]) for j in range(n)])/n + 2*b
w0 = w0 - alpha*delta_w0
w1 = w1 - alpha*delta_w1
w2 = w2 - alpha*delta_w2
w3 = w3 - alpha*delta_w3
b = b - alpha*delta_b
if np.sum(np.abs([delta_w0, delta_w1, delta_w2, delta_w3, delta_b])) < 1e-5:
break
# Make predictions
pred = [1 if i > 0.5 else 0 for i in y_hat]
# Find no:of correct predictions
correct = np.sum([1 if pred[i] == y[i] else 0 for i in range(n)])
print (correct)