我正在尝试编写代码来解决以下问题(如CalTech课程“从数据学习”中HW5中所述):
在此问题中,您将创建自己的目标函数f (在这种情况下为概率)和数据集D,以了解如何进行Logistic 回归工程。为简单起见,我们将f设为0 = 1 因此y是x的确定性函数。取d = 2 可以可视化问题,并令X = [-1; 1]×[-1; 1]与制服 每个x 2 X的选择概率。在飞机上选择一条线作为 f(x)= 1(其中y必须为+1)和f(x)= 0(其中y 必须为-1),取两个随机且均匀分布的点 X,并将通过它们的线作为y =之间的边界 ±1。从X随机选择N = 100个训练点,并评估 每个点xn的输出yn。使用运行Logistic回归 随机梯度下降以找到g并估计Eout( 熵误差) 点来评估错误。重复实验100次 不同的目标并取平均值。初始化权重向量 每次运行中逻辑回归到全零的概率。停止算法 当| w(t-1)-w(t)| <0:01,其中w(t)表示权重向量 时代结束历元是对N个数据点的完整遍历 (使用1; 2;···N的随机排列表示数据 指向每个时期的算法,并使用不同的 不同时期的排列)。学习率为0.01。
我需要计算N = 100时最接近Eout的值,以及所需条件的平均历元数。
我编写并运行了代码,但没有得到正确的答案(如解决方案中所述,这些结果是Eout接近0.1,时期数接近350)。增量w为0.01所需的历元数太少(大约10),而误差太大(大约2)。然后,我尝试用| w(t-1)-w(t)|代替标准<0.001(而不是0.01)。然后,平均所需的纪元数约为250,样本外误差约为0.35。
我的代码/解决方案是否有问题,或者提供的答案有问题?我添加了注释以指示我打算在每个步骤中执行的操作。预先感谢。
library(pracma)
h<- 0 # h will later be updated to number of required epochs
p<- 0 # p will later be updated to Eout
C <- matrix(ncol=10000, nrow=2) # Testing set, used to calculate out of sample error
d <- matrix(ncol=10000, nrow=1)
for(i in 1:10000){
C[, i] <- c(runif(2, min = -1, max = 1)) # Sample data
d[1, i] <- sign(C[2, i] - f(C[1, i]))
}
for(g in 1:100){ # 100 runs of the experiment
x <- runif(2, min = -1, max = 1)
y <- runif(2, min = -1, max = 1)
fit = (lm(y~x))
t <- summary(fit)$coefficients[,1]
f <- function(x){ # Target function
t[2]*x + t[1]
}
A <- matrix(ncol=100, nrow=2) # Sample data
b <- matrix(ncol=100, nrow=1)
norm_vec <- function(x) {sqrt(sum(x^2))} # vector norm calculator
w <- c(0,0) # weights initialized to zero
for(i in 1:100){
A[, i] <- c(runif(2, min = -1, max = 1)) # Sample data
b[1, i] <- sign(A[2, i] - f(A[1, i]))
}
q <- matrix(nrow = 2, ncol = 1000) # q tracks the weight vector at the end of each epoch
l= 1
while(l < 1001){
E <- function(z){ # cross entropy error function
x = z[1]
y = z[2]
v = z[3]
return(log(1 + exp(-v*t(w)%*%c(x, y))))
}
err <- function(xn1, xn2, yn){ #gradient of error function
return(c(-yn*xn1, -yn*xn2)*(exp(-yn*t(w)*c(xn1,xn2))/(1+exp(-yn*t(w)*c(xn1,xn2)))))
}
e = matrix(nrow = 2, ncol = 100) # e will track the required gradient at each data point
e[,1:100] = 0
perm = sample(100, 100, replace = FALSE, prob = NULL) # Random permutation of the data indices
for(j in 1:100){ # One complete Epoch
r = A[,perm[j]] # pick the perm[j]th entry in A
s = b[perm[j]] # pick the perm[j]th entry in b
e[,perm[j]] = err(r[1], r[2], s) # Gradient of the error
w = w - 0.01*e[,perm[j]] # update the weight vector accorng to the formula involving step size, gradient
}
q[,l] = w # the lth entry is the weight vector at the end of the lth epoch
if(l > 1 & norm_vec(q[,l] - q[,l-1])<0.001){ # given criterion to terminate the algorithm
break
}
l = l+1 # move to the next epoch
}
for(n in 1:10000){
p[g] = mean(E(c(C[1,n], C[2, n], d[n]))) # average over 10000 data points, of the error function, in experiment no. g
}
h[g] = l #gth entry in the vector h, tracks the number of epochs in the gth iteration of the experiment
}
mean(h) # Mean number of epochs needed
mean(p) # average Eout, over 100 experiments