非线性数据的逻辑回归

时间:2020-09-17 05:25:34

标签: r regression logistic-regression

我有一个具有连续自变量和二进制相关数据。因此,我试图将逻辑回归用于此数据的分析。但是,与具有S形过渡的经典情况相反,我有两个过渡。 这是我的意思的一个例子

library(ggplot)
library(visreg)

classic.data = data.frame(x = seq(from = 0, by = 0.5, length = 30),
                          y = c(rep(0, times = 14), 1, 0, rep(1, times = 14)))

model.classic = glm(formula = y ~ x,
                    data = classic.data,
                    family = "binomial")

summary(model.classic)

visreg(model.classic,
       partial = FALSE,
       scale = "response",
       alpha = 0)

Classical data

my.data = data.frame(x = seq(from = 0, by = 0.5, length = 30),
                     y = c(rep(0, times = 10), rep(1, times = 10), rep(0, times = 10)))

model.my = glm(formula = y ~ x,
                    data = my.data,
                    family = "binomial")

summary(model.my)

visreg(model.my,
       partial = FALSE,
       scale = "response",
       alpha = 0)

My data

两个图上的蓝线-是glm的结果,而红线则是我想要的。 有什么方法可以对这些数据进行逻辑回归吗?还是应该应用其他类型的回归分析?

1 个答案:

答案 0 :(得分:2)

在第二个模型中,y不是x的线性函数。当您写y ~ x时,您假设x增加时,y会增加/减少,取决于正/负系数。并非如此,它先增加然后减少,使x的平均效果为零(因此,海峡线)。因此,您需要一个非线性函数。您可以使用gam包中的mgcv来做到这一点,其中x的效果被建模为平滑函数:

library(mgcv)
my.data = data.frame(x = seq(from = 0, by = 0.5, length = 30),
                     y = c(rep(0, times = 10), rep(1, times = 10), rep(0, times = 10)))

m = gam(y ~ s(x), data = my.data, family = binomial)
plot(m)

enter image description here

这将导致以下与原始比例相符:

my.data$prediction = predict(m, type = "response")
plot(my.data$x, my.data$y)
lines(my.data$x, my.data$prediction, col = "red")

enter image description here