I'm trying to implement a simple logistic regression model in Scala. That is there is only one independent variable to the dependent binary variable. If we look as the example from Wikipedia: https://en.wikipedia.org/wiki/Logistic_regression上获取addEventListener自变量(在“通过考试的概率与学习时间的关系”下),并尝试获得相同的theta0和theta1系数用我当前的代码,我们接近了。
当方法gradientDescent中的if语句被注释掉时,theta0 = -4.360295851109452和theta1 = 1.5166246438642796具有最大迭代次数。 这与Wikipedia示例非常接近,其中theta0 = −4.0777和theta1 = 1.5046。
如果没有注释掉if语句,则theta0 = 0.0和theta1 = 0.0,只有一个迭代,这意味着返回立即发生。 我不确定为什么会这样。尽管我什至不确定我离工作模型还有多远。
通常,我正在尝试实施此处显示的内容:https://www.internalpointers.com/post/cost-function-logistic-regression 据我了解:如果我们获得梯度下降的最佳最优theta,则可以将S曲线拟合到原始数据点。
import scala.collection.mutable.Buffer
import scala.collection.mutable.ArrayBuffer
import scala.math.exp
import scala.math.abs
object Testing extends App {
val logistic = new LogisticRegressionCalculator
val yData = Array(0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1).map(z => z.toDouble)
val xData = Array.tabulate(20)({a => 0.5 + a * 0.25})
val thetas = logistic.deriveThetas(yData, xData)
println(thetas)
}
class LogisticRegressionCalculator {
//Learning rate
private var alpha = 0.01
//Tolerance
private var epsilon = 10E-10
//Number of iterations
private var maxIterations = 1000000
def changeAlpha(newAlpha: Double) = this.alpha = newAlpha
def changeEpsilon(newEpsilon: Double) = this.epsilon = newEpsilon
def changeMaxIterations(newMaxIter: Int) = this.maxIterations = newMaxIter
def giveAlpha: Double = this.alpha
def giveEpsilon: Double = this.epsilon
def giveMaxIterations: Int = this.maxIterations
/*
* This function is for the simple case where we have only one independent variable for the y values
* which are either zero or one.
* It is assumed that the left
*/
def deriveThetas(yData: Array[Double], xData: Array[Double]): Buffer[Double] = {
require(yData.size == xData.size)
//Traces below would be used for testing to see if the values obtained make sence
// val traceTheta0 = Array.ofDim[Double](this.maxIterations)
// val traceTheta1 = Array.ofDim[Double](this.maxIterations)
var theta0 = 0.0
var tempTheta0 = theta0
var theta1 = 0.0
var tempTheta1 = theta1
val dataSize = yData.size
var counter = 0
//Hypothesis function for logistic regression in the form of sigmoid function
def hypothesis(z: Double) = 1.0 / (1.0 + exp(-z))
def deriveTheta0: Double = {
var sum = 0.0
for (i <- 0 until dataSize) {
sum += (hypothesis(theta0 + theta1 * xData(i)) - yData(i)) //here should be * 1 as the coefficient of theta0 is 1.
}
return -(this.alpha / dataSize) * sum
}
def deriveTheta1: Double = {
var sum = 0.0
for (i <- 0 until dataSize) {
sum += (hypothesis(theta0 + theta1 * xData(i)) - yData(i)) * xData(i)
}
return -(this.alpha / dataSize) * sum
}
def gradientDescent: (Double, Double, Double) = {
for (i <- 0 until this.maxIterations) {
//println(s"Theta0: ${theta0}\tTheta1: ${theta1}")
counter += 1
tempTheta0 = theta0 + deriveTheta0
tempTheta1 = theta1 + deriveTheta1
//If the difference is so miniscule that further iterations are of no use.
// if (abs(tempTheta0 - theta0) >= epsilon || abs(tempTheta1 - theta1) >= epsilon) {
//
// return(theta0, theta1, counter)
//
// }
theta0 = tempTheta0
theta1 = tempTheta1
}
(theta0, theta1, counter)
}
val temp = gradientDescent
Buffer(temp._1, temp._2, counter)
}
}