R

时间:2015-09-18 15:54:17

标签: r ggplot2 linear-regression

我有以下线性模型:

model <- lm(var01 ~ a0 + a1 + a2 + a3 + a4 + a5,NT)

其中var01是0-100的intervall-scaled变量,a0-a5是虚编码(0,1)变量。摘要(模型)给出了这个:

Residuals:
    Min      1Q  Median      3Q     Max 
-75.951 -13.469  -7.239  18.795  80.531 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  59.6015     8.7076   6.845 5.48e-10 ***
a01         -46.1329     8.6302  -5.345 5.37e-07 ***
a11          -0.8744     9.0549  -0.097   0.9233    
a21          22.0408     9.1278   2.415   0.0175 *  
a31           9.5488     9.9284   0.962   0.3384    
a41          14.9227     7.6762   1.944   0.0546 .  
a51          -8.1222    11.8530  -0.685   0.4947    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 32.13 on 104 degrees of freedom
Multiple R-squared:  0.4393,    Adjusted R-squared:  0.407 
F-statistic: 13.58 on 6 and 104 DF,  p-value: 2.486e-11

我想创建一个箱形图,其中a0-a5彼此相邻显示,但只有a0 == 1,a1 == 1等。

所以我试过了:

ggplot(NT, aes(factor(a0), var01)) +
  geom_boxplot() +
  geom_smooth(method = "lm", se=FALSE, color="black", aes(group=1))

但是这显示了a0 == 0和a0 == 1的箱形图彼此相邻。所以有两个问题:如何让R只显示a0 == 1?此外,在同一图形中a0(但也限于a1-a4 == 1)旁边的所有其他四个预测变量a1-a5?

非常感谢帮助。谢谢:))

更新:示例数据

id  category_a  var01   a0  a1  a2  a3  a4  a5
3   1;5          100    0   1   0   0   0   1
4   1;5            0    0   1   0   0   0   1
5   0             21    1   0   0   0   0   0
6   1;2;4        100    0   1   1   0   1   0
9   1;2           68    0   1   1   0   0   0

所以a0-a5是多类别变量“category_a”的虚拟编码。

1 个答案:

答案 0 :(得分:0)

这是数据重塑的问题。如果您感兴趣的每个数据点都是数据帧中的一行(长格式),那么ggplot效果最佳。

library(ggplot2)
library(reshape2)
#generate data
set.seed(1)
n=1000
NT <- data.frame(id=1:n,
                   var01=rnorm(n),
                   a0=rbinom(n,1,0.2),
                   a1=rbinom(n,1,0.2),
                   a2=rbinom(n,1,0.2),
                   a3=rbinom(n,1,0.2),
                   a4=rbinom(n,1,0.2),
                   a5=rbinom(n,1,0.2))

#do some data-reshaping before plotting
#ggplot needs each data-point on one line
#so transform to long
plotdata <- melt(NT,id.vars=c("id","var01"),variable.name="a")

现在很容易绘制所有内容:

#plot everything using interaction
p1 <- ggplot(plotdata, aes(x=interaction(a,value), y=var01)) +
  geom_boxplot()
p1

enter image description here

或选择:

p2 <- ggplot(plotdata[plotdata$value==1,], 
             aes(x=a, y=var01)) +
  geom_boxplot()

p2

enter image description here