手动计算逻辑回归系数

时间:2020-06-17 13:51:16

标签: r logistic-regression nonlinear-optimization

我正在尝试估计R中的logistic回归,以手工计算一切。 我可以创建logit和loglikelihood函数,但是无法使用som非线性求解器来解决它

我想寻求建议

df <- read_csv("http://courses.atlas.illinois.edu/spring2016/STAT/STAT200/RProgramming/data/Default.csv")
df


df$default = ifelse(df$default == "Yes", 1, 0)

logit <- function(x, b0, b1) {
  1/(1 + exp(-b0 - b1*x))
}



Loglikel <- function(y, x, b0, b1) {
  b0 = rep(b0, length(y))
  b1 = rep(b1, length(y))
  p <- logit(x, b0, b1)
  sum(y*log(p)  + (1 - y)*log(1-  p))
}


Loglikel(df$default, df$balance, -10, 0.005)


library(stats4)

mle(Loglikel, 
    start = list(b0 = 0, b1 = 0), 
    fixed = list(y = df$default, x = df$balance))


1 个答案:

答案 0 :(得分:1)

我接受了您的代码并对其进行了一些修改,以将参数作为矢量传递:

df <- read_csv("http://courses.atlas.illinois.edu/spring2016/STAT/STAT200/RProgramming/data/Default.csv")
df$default <- ifelse(df$student == "Yes", 1, 0)

logit <- function(x, b0, b1) {
  1/(1 + exp(-b0-b1*x))
}
Loglikel <- function(par, y, x){
  p <- logit(x, par[1], par[2])
  sum(y*log(p)  + (1-y)*log(1-p))
}

我们现在准备使用非线性求解器(例如nlm)来获取参数估计值:

nlm_fit <- nlm(Loglikel, p = c(-2,0.001), x=df$balance, y=df$default)

给出

> nlm_fit
...
$estimate
[1] -2.0002960 -0.2666521
...

nlm使用Newton-Raphson型求解器来最小化MLE。同时,glm使用迭代加权最小二乘算法,这意味着glmnlm的输出不必达成共识:

glm_fit <- glm(default ~ balance, family = binomial(link="logit"), data = df)

> glm_fit

Call:  glm(formula = default ~ balance, family = binomial(link = "logit"), 
    data = df)

Coefficients:
(Intercept)      balance  
 -1.7004224    0.0009409 

检查此link,可以很好地总结glm内部的情况。