Question

My data is binary with two linear independent variables. For both predictors, as they get bigger, there are more positive responses. I have plotted the data in a heatplot showing density of positive responses along the two variables. There are the most positive responses in the top right corner and negative responses in the bottom left, with a gradient change visible along both axes.

I would like to plot a line on the heatplot showing where a logistic regression model predicts that positive and negative responses are equally likely. (My model is of the form response~predictor1*predictor2+(1|participant).)

My question: How can I figure out the line based on this model at which the positive response rate is 0.5?

I tried using predict(), but that works the opposite way; I have to give it values for the factor rather than giving the response rate I want. I also tried using a function that I used before when I had only one predictor (function(x) (log(x/(1-x))-fixef(fit)[1])/fixef(fit)[2]), but I can only get single values out of that, not a line, and I can only get values for one predictor at a time.

Answer 1

使用适合mtcars数据集的简单示例逻辑回归模型，以及描述here的代数，我可以使用以下方法生成带有决策边界的热图：

library(ggplot2)
library(tidyverse)
data("mtcars")

m1 = glm(am ~ hp + wt, data = mtcars, family = binomial)

# Generate combinations of hp and wt across their observed range. Only
#   generating 50 values of each here, which is not a lot but since each
#   combination is included, you get 50 x 50 rows 
pred_df = expand.grid(
    hp = seq(min(mtcars$hp), max(mtcars$hp), length.out = 50),
    wt = seq(min(mtcars$wt), max(mtcars$wt), length.out = 50)
)
pred_df$pred_p = predict(m1, pred_df, type = "response")


# For a given value of hp (predictor1), find the value of
#   wt (predictor2) that will give predicted p = 0.5
find_boundary = function(hp_val, coefs) {
    beta_0 = coefs['(Intercept)']
    beta_1 = coefs['hp']
    beta_2 = coefs['wt']
    boundary_wt =  (-beta_0 - beta_1 * hp_val) / beta_2
}


# Find the boundary value of wt for each of the 50 values of hp
# Using the algebra in the linked question you can instead find
# the slope and intercept of the boundary, so you could potentially
# skip this step
boundary_df = pred_df %>%
    select(hp) %>%
    distinct %>%
    mutate(wt = find_boundary(hp, coef(m1)))


ggplot(pred_df, aes(x = hp, y = wt)) +
    geom_tile(aes(fill = pred_p)) +
    geom_line(data = boundary_df)

制作：

请注意，这只考虑了模型的固定效果，因此如果您想以某种方式考虑随机效果，这可能会更复杂。

plot logistic regression line over heat plot

1 个答案: