My data is binary with two linear independent variables. For both predictors, as they get bigger, there are more positive responses. I have plotted the data in a heatplot showing density of positive responses along the two variables. There are the most positive responses in the top right corner and negative responses in the bottom left, with a gradient change visible along both axes.
I would like to plot a line on the heatplot showing where a logistic regression model predicts that positive and negative responses are equally likely. (My model is of the form response~predictor1*predictor2+(1|participant)
.)
My question: How can I figure out the line based on this model at which the positive response rate is 0.5?
I tried using predict(), but that works the opposite way; I have to give it values for the factor rather than giving the response rate I want. I also tried using a function that I used before when I had only one predictor (function(x) (log(x/(1-x))-fixef(fit)[1])/fixef(fit)[2]
), but I can only get single values out of that, not a line, and I can only get values for one predictor at a time.
答案 0 :(得分:2)
使用适合mtcars
数据集的简单示例逻辑回归模型,以及描述here的代数,我可以使用以下方法生成带有决策边界的热图:
library(ggplot2)
library(tidyverse)
data("mtcars")
m1 = glm(am ~ hp + wt, data = mtcars, family = binomial)
# Generate combinations of hp and wt across their observed range. Only
# generating 50 values of each here, which is not a lot but since each
# combination is included, you get 50 x 50 rows
pred_df = expand.grid(
hp = seq(min(mtcars$hp), max(mtcars$hp), length.out = 50),
wt = seq(min(mtcars$wt), max(mtcars$wt), length.out = 50)
)
pred_df$pred_p = predict(m1, pred_df, type = "response")
# For a given value of hp (predictor1), find the value of
# wt (predictor2) that will give predicted p = 0.5
find_boundary = function(hp_val, coefs) {
beta_0 = coefs['(Intercept)']
beta_1 = coefs['hp']
beta_2 = coefs['wt']
boundary_wt = (-beta_0 - beta_1 * hp_val) / beta_2
}
# Find the boundary value of wt for each of the 50 values of hp
# Using the algebra in the linked question you can instead find
# the slope and intercept of the boundary, so you could potentially
# skip this step
boundary_df = pred_df %>%
select(hp) %>%
distinct %>%
mutate(wt = find_boundary(hp, coef(m1)))
ggplot(pred_df, aes(x = hp, y = wt)) +
geom_tile(aes(fill = pred_p)) +
geom_line(data = boundary_df)
制作:
请注意,这只考虑了模型的固定效果,因此如果您想以某种方式考虑随机效果,这可能会更复杂。