线性回归散点图残差散点图代码

时间:2020-10-23 17:18:11

标签: r linear-regression scatter-plot

我进行了线性回归

lm.fit <- lm(intp.trust~age+v225+age*v225+v240+v241+v242,data=intp.trust)

摘要(lm.fit)

并获得以下结果

Call:
lm(formula = intp.trust ~ age + v225 + age * v225 + v240 + v241 + 
    v242, data = intp.trust)

Residuals:
     Min       1Q   Median       3Q      Max 
-1.32050 -0.33299 -0.04437  0.30899  2.35520 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)  2.461e+00  2.881e-02  85.418  < 2e-16 ***
age         -2.416e-03  5.144e-04  -4.697 2.66e-06 ***
v225         5.794e-04  1.574e-02   0.037    0.971    
v240         2.111e-02  2.729e-03   7.734 1.07e-14 ***
v241        -1.177e-03  1.958e-04  -6.014 1.83e-09 ***
v242        -1.473e-02  4.166e-04 -35.354  < 2e-16 ***
age:v225     4.214e-06  3.101e-04   0.014    0.989    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.4833 on 34845 degrees of freedom
  (21516 observations deleted due to missingness)
Multiple R-squared:  0.05789,   Adjusted R-squared:  0.05773 
F-statistic: 356.8 on 6 and 34845 DF,  p-value: < 2.2e-16

“考虑以上回归中的残差。使用适当的图比较男女残差分布吗?” 男性和女性使用变量v225进行编码。如何创建此图? 首先,我创建了:

lm.res <- resid(lm.fit)

但是我不确定下一步是什么。 该图应该是男女残差颜色不同的残差散点图。

我尝试了这个,但是没有用

ggplot(intp.trust, aes(x = intp.trust, y = lm.res, color = v225)) + geom_point()

2 个答案:

答案 0 :(得分:0)

在这一行:

ggplot(intp.trust, aes(x = intp.trust, y = lm.res, color = v225)) + geom_point()

您在说:“在data.frame intp.trust中查找名为lm.res的变量,并将其绘制为y

但是您将lm.res创建为独立对象,而不是intp.trust的列。像这样将模型中的残差分配到data.frame中的新列:

intp.trust$lm.res <- resid(lm.fit)

它应该工作。伪数据示例:

library(ggplot2)

# generate data
true_function <- function(x, is_female) {
  ifelse(is_female, 5, 2) +
    ifelse(is_female, -1.5, 1.5) * x +
    rnorm(length(x))
}

set.seed(123)
dat <- data.frame(x = runif(200, 1, 5), is_female = rbinom(200, 1, .5))
dat$y <- with(dat, true_function(x, is_female))

# regression
lm_fit <- lm(y ~ x + as.factor(is_female), data=dat)
# add residuals to data.frame
dat$resid <- resid(lm_fit)

# plot
ggplot(dat, aes(x=x, y=resid, color=as.factor(is_female))) +
  geom_point()

enter image description here

答案 1 :(得分:0)

以下是您可以遵循并获得所需内容的示例

# Sample Data
x_1 <- rnorm(100)
x_2 <- runif(100, 10, 30)
x_3 <- rnorm(100) * runif(100)
y <- rnorm(100, mean = 10)
gender <- sample(c("F", "M"), replace = TRUE)
df <- data.frame(x_1, x_2, x_3, y, gender)
# Fit model
lm.fit <- lm(y ~ x_1 + x_2 + x_1 * x_2 + x_3, data = df)
# Update data.frame
df$residuals <- lm.fit$residuals
# Scatter Residuals
ggplot(df) +
  geom_point(aes(x = as.numeric(row.names(df)), y = residuals, color = gender)) +
  labs(x = 'Index', y = 'Residual value', title = 'Residual scatter plot')