Asking review and help for implementing logistic regression model

时间:2019-04-13 10:46:15

标签: java scala logistic-regression

I'm trying to implement a simple logistic regression model in Scala. That is there is only one independent variable to the dependent binary variable. If we look as the example from Wikipedia: https://en.wikipedia.org/wiki/Logistic_regression上获取addEventListener自变量(在“通过考试的概率与学习时间的关系”下),并尝试获得相同的theta0和theta1系数用我当前的代码,我们接近了。

当方法gradientDescent中的if语句被注释掉时,theta0 = -4.360295851109452和theta1 = 1.5166246438642796具有最大迭代次数。 这与Wikipedia示例非常接近,其中theta0 = −4.0777和theta1 = 1.5046。

如果没有注释掉if语句,则theta0 = 0.0和theta1 = 0.0,只有一个迭代,这意味着返回立即发生。 我不确定为什么会这样。尽管我什至不确定我离工作模型还有多远。

通常,我正在尝试实施此处显示的内容:https://www.internalpointers.com/post/cost-function-logistic-regression 据我了解:如果我们获得梯度下降的最佳最优theta,则可以将S曲线拟合到原始数据点。



import scala.collection.mutable.Buffer
import scala.collection.mutable.ArrayBuffer
import scala.math.exp
import scala.math.abs



object Testing extends App {

  val logistic = new LogisticRegressionCalculator
  val yData    = Array(0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1).map(z => z.toDouble)
  val xData    = Array.tabulate(20)({a => 0.5 + a * 0.25})
  val thetas   = logistic.deriveThetas(yData, xData)
  println(thetas)
}


class LogisticRegressionCalculator {

  //Learning rate
  private var alpha   = 0.01
  //Tolerance
  private var epsilon = 10E-10
  //Number of iterations
  private var maxIterations = 1000000

  def changeAlpha(newAlpha: Double)        = this.alpha = newAlpha
  def changeEpsilon(newEpsilon: Double)    = this.epsilon = newEpsilon
  def changeMaxIterations(newMaxIter: Int) = this.maxIterations = newMaxIter
  def giveAlpha: Double      = this.alpha
  def giveEpsilon: Double    = this.epsilon
  def giveMaxIterations: Int = this.maxIterations
  /*
   * This function is for the simple case where we have only one independent variable for the y values
   * which are either zero or one.
   * It is assumed that the left
   */

  def deriveThetas(yData: Array[Double], xData: Array[Double]): Buffer[Double] = {
    require(yData.size == xData.size)
    //Traces below would be used for testing to see if the values obtained make sence
//    val traceTheta0 = Array.ofDim[Double](this.maxIterations)
//    val traceTheta1 = Array.ofDim[Double](this.maxIterations)

    var theta0     = 0.0
    var tempTheta0 = theta0
    var theta1     = 0.0
    var tempTheta1 = theta1
    val dataSize   = yData.size
    var counter    = 0

    //Hypothesis function for logistic regression in the form of sigmoid function
    def hypothesis(z: Double) = 1.0 / (1.0 + exp(-z))

    def deriveTheta0: Double = {
      var sum = 0.0
      for (i <- 0 until dataSize) {
        sum += (hypothesis(theta0 + theta1 * xData(i)) - yData(i)) //here should be * 1 as the coefficient of theta0 is 1.
      }
      return -(this.alpha / dataSize) * sum
    }

    def deriveTheta1: Double = {
      var sum = 0.0
      for (i <- 0 until dataSize) {
        sum += (hypothesis(theta0 + theta1 * xData(i)) - yData(i)) * xData(i)
      }
      return -(this.alpha / dataSize) * sum
    }

    def gradientDescent: (Double, Double, Double) = {
      for (i <- 0 until this.maxIterations) {
        //println(s"Theta0: ${theta0}\tTheta1: ${theta1}")
        counter += 1
        tempTheta0 = theta0 + deriveTheta0
        tempTheta1 = theta1 + deriveTheta1

        //If the difference is so miniscule that further iterations are of no use.
//        if (abs(tempTheta0 - theta0) >= epsilon || abs(tempTheta1 - theta1) >= epsilon) {
//          
//          return(theta0, theta1, counter)
//          
//        }
        theta0     = tempTheta0
        theta1     = tempTheta1
      }
      (theta0, theta1, counter)
    }
    val temp = gradientDescent
    Buffer(temp._1, temp._2, counter)
  }
}

0 个答案:

没有答案