短格式:
如何通过R中的梯度下降实现多类逻辑回归分类算法?如果标签超过两个,可以使用optim()
吗?
MatLab代码是:
function [J, grad] = cost(theta, X, y, lambda)
m = length(y);
J = 0;
grad = zeros(size(theta));
h_theta = sigmoid(X * theta);
J = (-1/m)*sum(y.*log(h_theta) + (1-y).*log(1-h_theta)) +...
(lambda/(2*m))*sum(theta(2:length(theta)).^2);
trans = X';
grad(1) = (1/m)*(trans(1,:))*(h_theta - y);
grad(2:size(theta, 1)) = 1/m * (trans(2:size(trans,1),:)*(h_theta - y) +...
lambda * theta(2:size(theta,1),:));
grad = grad(:);
end
和...
function [all_theta] = oneVsAll(X, y, num_labels, lambda)
m = size(X, 1);
n = size(X, 2);
all_theta = zeros(num_labels, n + 1);
initial_theta = zeros(n+1, 1);
X = [ones(m, 1) X];
options = optimset('GradObj', 'on', 'MaxIter', 50);
for c = 1:num_labels,
[theta] = ...
fmincg (@(t)(cost(t, X, (y == c), lambda)), ...
initial_theta, options);
all_theta(c,:) = theta';
end
长格式:
虽然可能不需要关注这个问题,但数据集可以下载here,一旦下载并放入R目录,加载为:
library(R.matlab)
data <- readMat('data.mat')
str(data)
List of 2
$ X: num [1:5000, 1:400] 0 0 0 0 0 0 0 0 0 0 ...
$ y: num [1:5000, 1] 10 10 10 10 10 10 10 10 10 10 ...
所以X
是一个包含5,000个示例的矩阵,每个包含400个要素,恰好是20 x 20手写图像的400像素从1到10的数字,例如这9:
应用逻辑回归算法根据计算机视觉预测手写数字&#34;这些400像素中的值表示不是二元决策的额外挑战。使用 ad hoc 梯度下降循环优化系数不太有效,如R-bloggers example中所述。
基于两个解释变量(特征)和二分结果,R-bloggers中还有一个很好的例子。该示例使用optim()
R函数,该函数似乎是the way to go。
即使我已阅读文档,但我在设置这个更复杂的示例时遇到了问题,我们必须在10个可能的结果中做出决定:
library(R.matlab)
data <- readMat('data.mat')
X = data$X # These are the values for the pixels in all 5000 examples.
y = data$y # These are the actual correct labels for each example.
y = replace(y, y == 10, 0) # Replacing 10 with 0 for simplicity.
# Defining the sigmoid function for logistic regression.
sigmoid = function(z){
1 / (1 + exp(-z))
}
X = cbind(rep(1, nrow(X)), X) # Adding an intercept or bias term (column of 1's).
# Defining the regularized cost function parametrized by the coefficients.
cost = function(theta){
hypothesis = sigmoid(X%*%theta)
# In "J" below we will need to have 10 columns of y:
y = as.matrix(model.matrix(lm(y ~ as.factor(y))))
m = nrow(y)
lambda = 0.1
# The regularized cost function is:
J = (1/m) * sum(-y * log(hypothesis) - (1 - y) * log(1 - hypothesis)) +
(lambda/(2 * m)) * sum(theta[2:nrow(theta), 1]^2)
J
}
no.pixels_plus1 = ncol(X) # These are the columns of X plus the intercept.
no.digits = length(unique(y)) # These are the number of labels (10).
# coef matrix rows = no. of labels; cols = no. pixels plus intercept:
theta_matrix = t(matrix(rep(0, no.digits*no.pixels_plus1), nrow = no.digits))
cost(theta_matrix) # The initial cost:
# [1] 0.6931472
theta_optim = optim(par = theta_matrix, fn = cost) # This is the PROBLEM step!
显然这似乎不完整,并给我错误信息:
Error in X %*% theta : non-conformable arguments
请注意X%*%theta_matrix
没有任何问题。所以问题必须在于我有10个分类器(0到9),并且我被迫创建一个带有10个y
列向量的矩阵,以便使用函数{ {1}}。解决方案有可能通过虚拟代码对cost
向量进行虚拟代码,其行如:y
,就像我上面的非工作代码一样,但我又不知道这一点封装了&#34;一对一&#34;想法 - 好吧,可能不是,也许这就是问题所在。
否则,它似乎在R-bloggers post上使用二进制分类器并且与相同的代码非常相似。
那么这个问题的正确语法是什么?
请注意I have tried to work it out one digit against all others,但我认为这在复杂性方面没有意义。
答案 0 :(得分:1)
您向theta
提供的optim
必须是向量。您可以将其转换为成本函数中的矩阵。
请在此处查看上一个问题:How to get optim working with matrix multiplication inside the function to be maximized in R