完整问题(> 150个字符):
在Logistic回归中可能存在多级因子(分类)预测变量的情况下,如何处理对应于p = 0.25,0.50,0.75概率水平的预测值?
在连续的预测变量的情况下,在逻辑回归中获得对应于p = 0.25,0.50,0.75概率水平的预测值很容易。看:
df <- data.frame(hour=c(0.50,0.75,1.00,1.25,1.50,1.75,1.75,2.00,2.25,2.50,2.75,3.00,3.25,3.50,4.00,4.25,4.50,4.75,5.00,5.50), pass=c(0,0,0,0,0,0,1,0,1,0,1,0,1,0,1,1,1,1,1,1))
df
df$pass <- as.factor(df$pass)
my_fit <- glm(df$pass ~ df$hour, data=df, na.action=na.exclude, family="binomial")
summary(my_fit)
my_table <- summary(my_fit)
my_table$coefficients[,1] <- invlogit(coef(my_fit))
my_table
plot(df$hour, df$pass, xlab="x", ylab="logit values")
LinearPredictions <- predict(my_fit); LinearPredictions
EstimatedProbability.hat <- exp(LinearPredictions)/(1 + exp(LinearPredictions))
EstimatedProbability.hat
EstimatedProbability <- c(0.25, 0.50, 0.75) # Estimated probabilities for which their x levels are wanted to be found
HoursStudied <- (log(EstimatedProbability/(1- EstimatedProbability)) - my_fit$coefficients[1])/ my_fit$coefficients[2]
HoursStudied.summary <- data.frame(EstimatedProbability, HoursStudied)
HoursStudied.summary
EstimatedProbability HoursStudied
#1 0.25 1.979936
#2 0.50 2.710083
#3 0.75 3.440230
因此,在逻辑回归图中将y = 0.25,y = 0.50,y = 0.75水平线和x = 1.97,x = 2.71,x = 344垂直线相加。
但是,当预测变量为a时,如何在逻辑回归图中(通过plot
或ggplot
)添加y = 0.25,y = 0.50,y = 0.75水平线及其对应的垂直线。因素变量,可能具有两个以上的水平?或者,这样做完全不合逻辑?