Question

我正在尝试编写代码来解决以下问题（如CalTech课程“从数据学习”中HW5中所述）：

在此问题中，您将创建自己的目标函数f （在这种情况下为概率）和数据集D，以了解如何进行Logistic 回归工程。为简单起见，我们将f设为0 = 1 因此y是x的确定性函数。取d = 2 可以可视化问题，并令X = [-1; 1]×[-1; 1]与制服每个x 2 X的选择概率。在飞机上选择一条线作为 f（x）= 1（其中y必须为+1）和f（x）= 0（其中y 必须为-1），取两个随机且均匀分布的点 X，并将通过它们的线作为y =之间的边界 ±1。从X随机选择N = 100个训练点，并评估每个点xn的输出yn。使用运行Logistic回归随机梯度下降以找到g并估计Eout（熵误差）点来评估错误。重复实验100次不同的目标并取平均值。初始化权重向量每次运行中逻辑回归到全零的概率。停止算法当| w（t-1）-w（t）| <0:01，其中w（t）表示权重向量时代结束历元是对N个数据点的完整遍历（使用1; 2;···N的随机排列表示数据指向每个时期的算法，并使用不同的不同时期的排列）。学习率为0.01。

我需要计算N = 100时最接近Eout的值，以及所需条件的平均历元数。

我编写并运行了代码，但没有得到正确的答案（如解决方案中所述，这些结果是Eout接近0.1，时期数接近350）。增量w为0.01所需的历元数太少（大约10），而误差太大（大约2）。然后，我尝试用| w（t-1）-w（t）|代替标准<0.001（而不是0.01）。然后，平均所需的纪元数约为250，样本外误差约为0.35。

我的代码/解决方案是否有问题，或者提供的答案有问题？我添加了注释以指示我打算在每个步骤中执行的操作。预先感谢。

library(pracma)

h<- 0 # h will later be updated to number of required epochs

p<- 0 # p will later be updated to Eout

C <- matrix(ncol=10000, nrow=2) # Testing set, used to calculate out of sample error

d <- matrix(ncol=10000, nrow=1)

for(i in 1:10000){
  C[, i] <- c(runif(2, min = -1, max = 1)) # Sample data
  d[1, i] <- sign(C[2, i] - f(C[1, i])) 
}

for(g in 1:100){ # 100 runs of the experiment

  x <- runif(2, min = -1, max = 1)

  y <- runif(2, min = -1, max = 1)

  fit = (lm(y~x))

  t <- summary(fit)$coefficients[,1] 

  f <- function(x){   # Target function
    t[2]*x + t[1]
  }

  A <- matrix(ncol=100, nrow=2) # Sample data

  b <- matrix(ncol=100, nrow=1)

  norm_vec <- function(x) {sqrt(sum(x^2))} # vector norm calculator

  w <- c(0,0) # weights initialized to zero

  for(i in 1:100){

    A[, i] <- c(runif(2, min = -1, max = 1)) # Sample data

    b[1, i] <- sign(A[2, i] - f(A[1, i])) 
  }

  q <- matrix(nrow = 2, ncol = 1000) # q tracks the weight vector at the end of each epoch

  l= 1

  while(l < 1001){

    E <- function(z){ # cross entropy error function

      x = z[1]

      y = z[2]

      v = z[3]

      return(log(1 + exp(-v*t(w)%*%c(x, y))))
    }

    err <- function(xn1, xn2, yn){ #gradient of error function

      return(c(-yn*xn1, -yn*xn2)*(exp(-yn*t(w)*c(xn1,xn2))/(1+exp(-yn*t(w)*c(xn1,xn2)))))
    }

    e = matrix(nrow = 2, ncol = 100) # e will track the required gradient at each data point

    e[,1:100] = 0 

    perm = sample(100, 100, replace = FALSE, prob = NULL) # Random permutation of the data indices

    for(j in 1:100){ # One complete Epoch

      r = A[,perm[j]] # pick the perm[j]th entry in A

      s = b[perm[j]]  # pick the perm[j]th entry in b

      e[,perm[j]] = err(r[1], r[2], s) # Gradient of the error

      w = w - 0.01*e[,perm[j]] # update the weight vector accorng to the formula involving step size, gradient
    }

    q[,l] = w # the lth entry is the weight vector at the end of the lth epoch

    if(l > 1 & norm_vec(q[,l] - q[,l-1])<0.001){ # given criterion to terminate the algorithm

      break
    }
    l = l+1 # move to the next epoch
  }

  for(n in 1:10000){

    p[g] = mean(E(c(C[1,n], C[2, n], d[n]))) # average over 10000 data points, of the error function, in experiment no. g
  }

  h[g] = l #gth entry in the vector h, tracks the number of epochs in the gth iteration of the experiment

}

mean(h) # Mean number of epochs needed 

mean(p) # average Eout, over 100 experiments

机器学习：R中的逻辑回归的随机梯度下降：计算Eout和平均历时数

0 个答案: