我有以下线性模型:
model <- lm(var01 ~ a0 + a1 + a2 + a3 + a4 + a5,NT)
其中var01是0-100的intervall-scaled变量,a0-a5是虚编码(0,1)变量。摘要(模型)给出了这个:
Residuals:
Min 1Q Median 3Q Max
-75.951 -13.469 -7.239 18.795 80.531
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 59.6015 8.7076 6.845 5.48e-10 ***
a01 -46.1329 8.6302 -5.345 5.37e-07 ***
a11 -0.8744 9.0549 -0.097 0.9233
a21 22.0408 9.1278 2.415 0.0175 *
a31 9.5488 9.9284 0.962 0.3384
a41 14.9227 7.6762 1.944 0.0546 .
a51 -8.1222 11.8530 -0.685 0.4947
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 32.13 on 104 degrees of freedom
Multiple R-squared: 0.4393, Adjusted R-squared: 0.407
F-statistic: 13.58 on 6 and 104 DF, p-value: 2.486e-11
我想创建一个箱形图,其中a0-a5彼此相邻显示,但只有a0 == 1,a1 == 1等。
所以我试过了:
ggplot(NT, aes(factor(a0), var01)) +
geom_boxplot() +
geom_smooth(method = "lm", se=FALSE, color="black", aes(group=1))
但是这显示了a0 == 0和a0 == 1的箱形图彼此相邻。所以有两个问题:如何让R只显示a0 == 1?此外,在同一图形中a0(但也限于a1-a4 == 1)旁边的所有其他四个预测变量a1-a5?
非常感谢帮助。谢谢:))
更新:示例数据
id category_a var01 a0 a1 a2 a3 a4 a5
3 1;5 100 0 1 0 0 0 1
4 1;5 0 0 1 0 0 0 1
5 0 21 1 0 0 0 0 0
6 1;2;4 100 0 1 1 0 1 0
9 1;2 68 0 1 1 0 0 0
所以a0-a5是多类别变量“category_a”的虚拟编码。
答案 0 :(得分:0)
这是数据重塑的问题。如果您感兴趣的每个数据点都是数据帧中的一行(长格式),那么ggplot效果最佳。
library(ggplot2)
library(reshape2)
#generate data
set.seed(1)
n=1000
NT <- data.frame(id=1:n,
var01=rnorm(n),
a0=rbinom(n,1,0.2),
a1=rbinom(n,1,0.2),
a2=rbinom(n,1,0.2),
a3=rbinom(n,1,0.2),
a4=rbinom(n,1,0.2),
a5=rbinom(n,1,0.2))
#do some data-reshaping before plotting
#ggplot needs each data-point on one line
#so transform to long
plotdata <- melt(NT,id.vars=c("id","var01"),variable.name="a")
现在很容易绘制所有内容:
#plot everything using interaction
p1 <- ggplot(plotdata, aes(x=interaction(a,value), y=var01)) +
geom_boxplot()
p1
或选择:
p2 <- ggplot(plotdata[plotdata$value==1,],
aes(x=a, y=var01)) +
geom_boxplot()
p2