我进行了线性回归
lm.fit <- lm(intp.trust~age+v225+age*v225+v240+v241+v242,data=intp.trust)
摘要(lm.fit)
并获得以下结果
Call:
lm(formula = intp.trust ~ age + v225 + age * v225 + v240 + v241 +
v242, data = intp.trust)
Residuals:
Min 1Q Median 3Q Max
-1.32050 -0.33299 -0.04437 0.30899 2.35520
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.461e+00 2.881e-02 85.418 < 2e-16 ***
age -2.416e-03 5.144e-04 -4.697 2.66e-06 ***
v225 5.794e-04 1.574e-02 0.037 0.971
v240 2.111e-02 2.729e-03 7.734 1.07e-14 ***
v241 -1.177e-03 1.958e-04 -6.014 1.83e-09 ***
v242 -1.473e-02 4.166e-04 -35.354 < 2e-16 ***
age:v225 4.214e-06 3.101e-04 0.014 0.989
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.4833 on 34845 degrees of freedom
(21516 observations deleted due to missingness)
Multiple R-squared: 0.05789, Adjusted R-squared: 0.05773
F-statistic: 356.8 on 6 and 34845 DF, p-value: < 2.2e-16
“考虑以上回归中的残差。使用适当的图比较男女残差分布吗?” 男性和女性使用变量v225进行编码。如何创建此图? 首先,我创建了:
lm.res <- resid(lm.fit)
但是我不确定下一步是什么。 该图应该是男女残差颜色不同的残差散点图。
我尝试了这个,但是没有用
ggplot(intp.trust, aes(x = intp.trust, y = lm.res, color = v225)) + geom_point()
答案 0 :(得分:0)
在这一行:
ggplot(intp.trust, aes(x = intp.trust, y = lm.res, color = v225)) + geom_point()
您在说:“在data.frame intp.trust
中查找名为lm.res
的变量,并将其绘制为y
”
但是您将lm.res
创建为独立对象,而不是intp.trust
的列。像这样将模型中的残差分配到data.frame中的新列:
intp.trust$lm.res <- resid(lm.fit)
它应该工作。伪数据示例:
library(ggplot2)
# generate data
true_function <- function(x, is_female) {
ifelse(is_female, 5, 2) +
ifelse(is_female, -1.5, 1.5) * x +
rnorm(length(x))
}
set.seed(123)
dat <- data.frame(x = runif(200, 1, 5), is_female = rbinom(200, 1, .5))
dat$y <- with(dat, true_function(x, is_female))
# regression
lm_fit <- lm(y ~ x + as.factor(is_female), data=dat)
# add residuals to data.frame
dat$resid <- resid(lm_fit)
# plot
ggplot(dat, aes(x=x, y=resid, color=as.factor(is_female))) +
geom_point()
答案 1 :(得分:0)
以下是您可以遵循并获得所需内容的示例
# Sample Data
x_1 <- rnorm(100)
x_2 <- runif(100, 10, 30)
x_3 <- rnorm(100) * runif(100)
y <- rnorm(100, mean = 10)
gender <- sample(c("F", "M"), replace = TRUE)
df <- data.frame(x_1, x_2, x_3, y, gender)
# Fit model
lm.fit <- lm(y ~ x_1 + x_2 + x_1 * x_2 + x_3, data = df)
# Update data.frame
df$residuals <- lm.fit$residuals
# Scatter Residuals
ggplot(df) +
geom_point(aes(x = as.numeric(row.names(df)), y = residuals, color = gender)) +
labs(x = 'Index', y = 'Residual value', title = 'Residual scatter plot')