机器学习:R中的逻辑回归的随机梯度下降:计算Eout和平均历时数

时间:2018-07-20 11:13:09

标签: r machine-learning logistic-regression gradient-descent

我正在尝试编写代码来解决以下问题(如CalTech课程“从数据学习”中HW5中所述):

  

在此问题中,您将创建自己的目标函数f   (在这种情况下为概率)和数据集D,以了解如何进行Logistic   回归工程。为简单起见,我们将f设为0 = 1   因此y是x的确定性函数。取d = 2   可以可视化问题,并令X = [-1; 1]×[-1; 1]与制服   每个x 2 X的选择概率。在飞机上选择一条线作为   f(x)= 1(其中y必须为+1)和f(x)= 0(其中y   必须为-1),取两个随机且均匀分布的点   X,并将通过它们的线作为y =之间的边界   ±1。从X随机选择N = 100个训练点,并评估   每个点xn的输出yn。使用运行Logistic回归   随机梯度下降以找到g并估计Eout(   熵误差)   点来评估错误。重复实验100次   不同的目标并取平均值。初始化权重向量   每次运行中逻辑回归到全零的概率。停止算法   当| w(t-1)-w(t)| <0:01,其中w(t)表示权重向量   时代结束历元是对N个数据点的完整遍历   (使用1; 2;···N的随机排列表示数据   指向每个时期的算法,并使用不同的   不同时期的排列)。学习率为0.01。

我需要计算N = 100时最接近Eout的值,以及所需条件的平均历元数。

我编写并运行了代码,但没有得到正确的答案(如解决方案中所述,这些结果是Eout接近0.1,时期数接近350)。增量w为0.01所需的历元数太少(大约10),而误差太大(大约2)。然后,我尝试用| w(t-1)-w(t)|代替标准<0.001(而不是0.01)。然后,平均所需的纪元数约为250,样本外误差约为0.35。

我的代码/解决方案是否有问题,或者提供的答案有问题?我添加了注释以指示我打算在每个步骤中执行的操作。预先感谢。

library(pracma)

h<- 0 # h will later be updated to number of required epochs

p<- 0 # p will later be updated to Eout

C <- matrix(ncol=10000, nrow=2) # Testing set, used to calculate out of sample error

d <- matrix(ncol=10000, nrow=1)

for(i in 1:10000){
  C[, i] <- c(runif(2, min = -1, max = 1)) # Sample data
  d[1, i] <- sign(C[2, i] - f(C[1, i])) 
}

for(g in 1:100){ # 100 runs of the experiment

  x <- runif(2, min = -1, max = 1)

  y <- runif(2, min = -1, max = 1)

  fit = (lm(y~x))

  t <- summary(fit)$coefficients[,1] 

  f <- function(x){   # Target function
    t[2]*x + t[1]
  }

  A <- matrix(ncol=100, nrow=2) # Sample data

  b <- matrix(ncol=100, nrow=1)

  norm_vec <- function(x) {sqrt(sum(x^2))} # vector norm calculator

  w <- c(0,0) # weights initialized to zero

  for(i in 1:100){

    A[, i] <- c(runif(2, min = -1, max = 1)) # Sample data

    b[1, i] <- sign(A[2, i] - f(A[1, i])) 
  }

  q <- matrix(nrow = 2, ncol = 1000) # q tracks the weight vector at the end of each epoch

  l= 1

  while(l < 1001){

    E <- function(z){ # cross entropy error function

      x = z[1]

      y = z[2]

      v = z[3]

      return(log(1 + exp(-v*t(w)%*%c(x, y))))
    }

    err <- function(xn1, xn2, yn){ #gradient of error function

      return(c(-yn*xn1, -yn*xn2)*(exp(-yn*t(w)*c(xn1,xn2))/(1+exp(-yn*t(w)*c(xn1,xn2)))))
    }

    e = matrix(nrow = 2, ncol = 100) # e will track the required gradient at each data point

    e[,1:100] = 0 

    perm = sample(100, 100, replace = FALSE, prob = NULL) # Random permutation of the data indices

    for(j in 1:100){ # One complete Epoch

      r = A[,perm[j]] # pick the perm[j]th entry in A

      s = b[perm[j]]  # pick the perm[j]th entry in b

      e[,perm[j]] = err(r[1], r[2], s) # Gradient of the error

      w = w - 0.01*e[,perm[j]] # update the weight vector accorng to the formula involving step size, gradient
    }

    q[,l] = w # the lth entry is the weight vector at the end of the lth epoch

    if(l > 1 & norm_vec(q[,l] - q[,l-1])<0.001){ # given criterion to terminate the algorithm

      break
    }
    l = l+1 # move to the next epoch
  }

  for(n in 1:10000){

    p[g] = mean(E(c(C[1,n], C[2, n], d[n]))) # average over 10000 data points, of the error function, in experiment no. g
  }

  h[g] = l #gth entry in the vector h, tracks the number of epochs in the gth iteration of the experiment

}

mean(h) # Mean number of epochs needed 

mean(p) # average Eout, over 100 experiments

0 个答案:

没有答案