如何在aov()分析中从预测变量中排除缺失的数据?

时间:2016-05-13 22:32:01

标签: r missing-data anova

我试图在具有四级(HD,HE,EP,ET)的分类预测器(ctng)的数据集上执行单向anova,并使用TukeyHSD测试进行分析。但是,我的预测变量有许多缺失值,并希望从分析中排除这些值。这些被读作另一个名为""的级别。这是我的代码:

> GEaov<-aov(ctng~allv$GE.CATIE)
> TukeyHSD(GEaov)
  Tukey multiple comparisons of means
    95% family-wise confidence level

Fit: aov(formula = ctng ~ allv$GE.CATIE)

$`allv$GE.CATIE`
             diff          lwr          upr     p adj
EP-    0.04003815 -0.147479895  0.227556198 0.9775550
ET-   -0.06458370 -0.400163176  0.270995782 0.9847460
HD-    0.12445374 -0.004557746  0.253465218 0.0647330
HE-   -0.17725081 -0.350691202 -0.003810417 0.0423469
ET-EP -0.10462185 -0.461182978  0.251939281 0.9301554
HD-EP  0.08441558 -0.092123972  0.260955141 0.6873773
HE-EP -0.21728896 -0.428485131 -0.006092791 0.0401655
HD-ET  0.18903743 -0.140533172  0.518608038 0.5190113
HE-ET -0.11266711 -0.462029948  0.236695724 0.9039447
HE-HD -0.30170455 -0.463212338 -0.140196753 0.0000038

我尝试将GE.CATIE中的空白值更改为“NA”,但之后它会执行相同的操作,除非现在它将“NA”计为预测变量。 na.action=na.omit不会改变任何内容。

1 个答案:

答案 0 :(得分:1)

# create some data
> xy <- data.frame(var1 = 1:3, var2 = c("a", "b", ""))

# find rows that have `""` in `var2` 
> xy$var2 == ""
[1] FALSE FALSE  TRUE

# subset these rows from the data.frame's variable `var2`
> xy[xy$var2 == "", "var2"]
[1] 
Levels:  a b

# change `""` to `NA` (not `"NA"`)
> xy[xy$var2 == "", "var2"] <- NA

# level `""` is now "orphaned". drop it using `droplevels()` 
# (see `levels(xy$var2)`)
> droplevels(xy)
  var1 var2
1    1    a
2    2    b
3    3 <NA>

NAs条目将自动被aov删除。