我有以下数据具有特殊的缺失值情况(缺少vcat1 == 3的vnum1的所有值):
> head(mydf)
vnum1 vcat1
1 -0.1624229 1
2 0.2465567 1
3 NA 3
4 0.7067778 2
5 NA 3
6 -0.2241726 4
> dput(mydf)
structure(list(vnum1 = c(-0.162422853864248, 0.246556718176803,
NA, 0.706777793886275, NA, -0.224172615208867, 0.0545850414695318,
NA, NA, -1.94778020954922, 1.89581259201036, 0.901973743223488,
-0.31255172156186, -1.67311124367419, 0.491316838004494, NA,
-0.699315343799762, 0.668020448193884, 1.45492995320554, 1.17747976289091,
-0.65137204397438, 1.78678696473193, 2.58978935829221, NA, 1.26534157843481,
0.629748102812663, 0.246596558590885, 0.968707124353133, 0.108668693948881,
-0.219419917000748, 2.25307417017233, -0.626124211646445, -1.16298694223082,
-1.23524906047676, -2.34636152907898, NA, 0.408667368960836,
0.272596114054819, 0.747455245383144, -0.745843219461836, -0.0966351379737077,
1.44803320811527, -1.5434982335725, -0.782902668540696, -0.448286848257394,
NA, 0.168327130336994, -0.493721325506037, 0.397253883862878,
1.57070527855864), vcat1 = structure(c(1L, 1L, 3L, 2L, 3L, 4L,
4L, 3L, 3L, 1L, 1L, 1L, 1L, 1L, 4L, 3L, 4L, 4L, 4L, 1L, 2L, 4L,
1L, 3L, 2L, 4L, 2L, 1L, 4L, 2L, 2L, 4L, 2L, 1L, 1L, 3L, 1L, 4L,
4L, 4L, 4L, 2L, 4L, 1L, 4L, 3L, 1L, 4L, 4L, 1L), .Label = c("1",
"2", "3", "4"), class = "factor")), .Names = c("vnum1", "vcat1"
), row.names = c(NA, 50L), class = "data.frame")
如果我使用tapply,我清楚地看到缺少的类别:
> with(mydf,tapply(vnum1, vcat1, mean))
1 2 3 4
0.09172749 0.48575555 NA 0.09632024
但它在集合函数中完全被忽略了:
> aggregate(vnum1~vcat1, mydf, mean)
vcat1 vnum1
1 1 0.09172749
2 2 0.48575555
3 4 0.09632024
我也希望在聚合函数中得到它。我该怎么做?感谢。
答案 0 :(得分:2)
在公式方法中,使用NA
保留aggregate(vnum1 ~ vcat1, mydf, mean, na.action = NULL)
# vcat1 vnum1
# 1 1 0.09172749
# 2 2 0.48575555
# 3 3 NA
# 4 4 0.09632024
结果。
with(mydf, aggregate(list(vnum1 = vnum1), list(vcat1 = vcat1), mean))
您也可以使用数据框方法而不必担心。
binstar