线性模型中因子与矢量对象类型的影响

时间:2013-07-30 18:52:10

标签: r lm anova r-factor

什么能解释下面两个aov之间的区别:

a = c(0.04875,0.13725,0.28350,0.50975,0.77425,0.94700,0.05325,0.14050,0.29725,0.51525,0.79000,0.95400,0.04625,0.15250,0.29000,0.53300,0.79825,0.95225,0.05025,0.14625,0.28800,0.52625,0.78200,0.95925,0.04700,0.14225,0.30325,0.53500,0.79325,0.95875,0.04775,0.13850,0.28675,0.54250,0.78300,0.95175,0.05150,0.12725,0.30175,0.54725,0.79475,0.96275,0.05375,0.14100,0.30050,0.53275,0.78100,0.96175,0.05450,0.15300,0.29650,0.52850,0.80100,0.95675,0.05425,0.13975,0.30875,0.56025,0.80575,0.96100,0.05100,0.15350,0.31175,0.53300,0.78900,0.96000,0.04650,0.13525,0.29600,0.53625,0.78475,0.96375,0.05375,0.13900,0.29600,0.53725,0.78700,0.95800,0.05075,0.14350,0.29225,0.54525,0.80275,0.95800,0.05050,0.13200,0.29850,0.52700,0.80525,0.96150,0.05150,0.14050,0.29450,0.54375,0.79450,0.96375,0.05375,0.13525,0.30475,0.55250,0.79425,0.96025,0.04950,0.14500,0.29425,0.52250,0.78475,0.95650,0.05225,0.14425,0.29225,0.53150,0.80425,0.95375)
b = c(4,4,4,4,4,4,6,6,6,6,6,6,8,8,8,8,8,8,10,10,10,10,10,10,12,12,12,12,12,12,14,14,14,14,14,14,16,16,16,16,16,16,18,18,18,18,18,18,20,20,20,20,20,20,22,22,22,22,22,22,24,24,24,24,24,24,26,26,26,26,26,26,28,28,28,28,28,28,30,30,30,30,30,30,32,32,32,32,32,32,34,34,34,34,34,34,36,36,36,36,36,36,38,38,38,38,38,38,40,40,40,40,40,40)
c = c(1,2,3,4,5,6,1,2,3,4,5,6,1,2,3,4,5,6,1,2,3,4,5,6,1,2,3,4,5,6,1,2,3,4,5,6,1,2,3,4,5,6,1,2,3,4,5,6,1,2,3,4,5,6,1,2,3,4,5,6,1,2,3,4,5,6,1,2,3,4,5,6,1,2,3,4,5,6,1,2,3,4,5,6,1,2,3,4,5,6,1,2,3,4,5,6,1,2,3,4,5,6,1,2,3,4,5,6,1,2,3,4,5,6)


summary(lm(a~b*as.factor(c)))
summary(lm(a~b*c))

使用as.factor时,是否认为c被视为非序数?

2 个答案:

答案 0 :(得分:2)

在这两种情况下,您都在探索a作为bc及其互动的函数。

当你强制c到一个因子时,为c 的每个不同值计算虚拟变量(实际上是c的每个级别,但在这种情况下所有级别存在,所以这些是相同的)。因此,探讨的互动介于cb的每个值之间。

否则,探索的交互是数字变量的交互。

如果c有更大的价值差异,那么差异可能更明显,即

c = c(1, 17, 2, 5, 131, 1, 4, 5, 2, 11, 17, 7, 1, 1, 17, .... etc)   

此外,在学习R时,请注意,请避免使用c作为变量名称。它也是一个使用频繁的函数的名称,它会很快使代码无法读取并导致可能的混淆

答案 1 :(得分:0)

您可以通过查看其model.matrix()结果来检查创建的模型的结构,因为model.matrix函数是lm函数用于构造数据以供分析的结果。公式的RHS:

> dim(model.matrix(~b*as.factor(c)))
[1] 114  12
> dim( model.matrix(~b*c))
[1] 114   4

> colnames(model.matrix(~b*as.factor(c)))
 [1] "(Intercept)"     "b"               "as.factor(c)2"   "as.factor(c)3"  
 [5] "as.factor(c)4"   "as.factor(c)5"   "as.factor(c)6"   "b:as.factor(c)2"
 [9] "b:as.factor(c)3" "b:as.factor(c)4" "b:as.factor(c)5" "b:as.factor(c)6"
> colnames( model.matrix(~b*c))
[1] "(Intercept)" "b"           "c"           "b:c"  

第二个模型中“c”变量的列名不会像第一个模型中那样分为不同的级别。 'b:c'-列将是'b'和'c'的乘积:

> describe(b*c)
b * c 
      n missing  unique    Mean     .05     .10     .25     .50     .75     .90 
    114       0      67      77    12.0    16.6    30.5    62.0   111.5   160.0 
    .95 
  190.7 

lowest :   4   6   8  10  12, highest: 200 204 216 228 240 
> describe(model.matrix(~b*c)[, "b:c"])
model.matrix(~b * c)[, "b:c"] 
      n missing  unique    Mean     .05     .10     .25     .50     .75     .90 
    114       0      67      77    12.0    16.6    30.5    62.0   111.5   160.0 
    .95 
  190.7 

lowest :   4   6   8  10  12, highest: 200 204 216 228 240