我试图在具有四级(HD,HE,EP,ET)的分类预测器(ctng)的数据集上执行单向anova,并使用TukeyHSD测试进行分析。但是,我的预测变量有许多缺失值,并希望从分析中排除这些值。这些被读作另一个名为""
的级别。这是我的代码:
> GEaov<-aov(ctng~allv$GE.CATIE)
> TukeyHSD(GEaov)
Tukey multiple comparisons of means
95% family-wise confidence level
Fit: aov(formula = ctng ~ allv$GE.CATIE)
$`allv$GE.CATIE`
diff lwr upr p adj
EP- 0.04003815 -0.147479895 0.227556198 0.9775550
ET- -0.06458370 -0.400163176 0.270995782 0.9847460
HD- 0.12445374 -0.004557746 0.253465218 0.0647330
HE- -0.17725081 -0.350691202 -0.003810417 0.0423469
ET-EP -0.10462185 -0.461182978 0.251939281 0.9301554
HD-EP 0.08441558 -0.092123972 0.260955141 0.6873773
HE-EP -0.21728896 -0.428485131 -0.006092791 0.0401655
HD-ET 0.18903743 -0.140533172 0.518608038 0.5190113
HE-ET -0.11266711 -0.462029948 0.236695724 0.9039447
HE-HD -0.30170455 -0.463212338 -0.140196753 0.0000038
我尝试将GE.CATIE中的空白值更改为“NA”,但之后它会执行相同的操作,除非现在它将“NA”计为预测变量。 na.action=na.omit
不会改变任何内容。
答案 0 :(得分:1)
# create some data
> xy <- data.frame(var1 = 1:3, var2 = c("a", "b", ""))
# find rows that have `""` in `var2`
> xy$var2 == ""
[1] FALSE FALSE TRUE
# subset these rows from the data.frame's variable `var2`
> xy[xy$var2 == "", "var2"]
[1]
Levels: a b
# change `""` to `NA` (not `"NA"`)
> xy[xy$var2 == "", "var2"] <- NA
# level `""` is now "orphaned". drop it using `droplevels()`
# (see `levels(xy$var2)`)
> droplevels(xy)
var1 var2
1 1 a
2 2 b
3 3 <NA>
NAs条目将自动被aov
删除。