我有一个结构数据框:
str(Ehen)
'data.frame': 412 obs. of 5 variables:
$ DATE : Date, format: "2012-09-11" "2012-09-19" ...
$ Population: Factor w/ 9 levels "Brathay","Clun",..: 4 4 4 4 4 4 4 4 4 4 ...
$ Fish : Factor w/ 3 levels "C","S","T": 2 2 2 2 2 2 2 2 2 2 ...
$ Length : int NA 70 70 80 70 60 70 60 60 70 ...
$ Width : int NA 60 50 70 60 50 60 50 50 60 ...
我想测试的是,每个人口的分布是正常的,按日期和鱼分组数据。
我试过了:
aggregate(Ehen$Length ~ Ehen$Fish + Ehen$DATE, FUN =shapiro.test)
Ehen$Fish Ehen$DATE Ehen$Length
1 C 2012-09-19 0.7975819
2 S 2012-09-19 0.8164554
3 S 2012-09-25 0.7935195
4 S 2012-10-04 0.9006435
5 C 2012-10-09 0.8411583
6 S 2012-10-09 0.913051
7 S 2012-10-11 0.8525953
8 C 2012-10-18 0.9084524
9 S 2012-10-18 0.9415459
10 C 2012-10-24 0.9592422
11 S 2012-10-24 0.9774688
12 C 2012-11-02 0.9536037
13 S 2012-11-02 0.9607917
14 C 2012-11-12 0.9570341
15 S 2012-11-12 0.9728865
这或多或少是我想要的,但是,我如何获得Shapiro测试的p值而不是W值?
我可以按日期约会:
shapiro.test(Ehen$Length[Ehen$DATE=="2012-10-24"])
data: Ehen$Length[Ehen$DATE == "2012-10-24"]
W = 0.9761, p-value = 0.2868
但这还不够......所以我试过了:
lapply(split(Ehen$Length, Ehen$Fish, drop = TRUE),shapiro.test)
$C
Shapiro-Wilk normality test
data: X[[1L]]
W = 0.9219, p-value = 1.548e-07
$S
Shapiro-Wilk normality test
data: X[[2L]]
W = 0.9201, p-value = 2.056e-10
但是,我不知道如何将Date作为变量包含在测试中的数据子集中。
我可能总是错了,或者我可能接近答案!提前谢谢
答案 0 :(得分:0)
你可以尝试
res <- aggregate(cbind(P.value=Length) ~ Fish + DATE, Ehen,
FUN = function(x) shapiro.test(x)$p.value)
head(res,3)
# Fish DATE P.value
#1 C 2012-09-19 0.25510132 #####
#2 S 2012-09-19 0.11941675
#3 C 2012-09-20 0.04459457
shapiro.test(Ehen$Length[Ehen$DATE=='2012-09-19' & Ehen$Fish=='C'])
# Shapiro-Wilk normality test
#data: Ehen$Length[Ehen$DATE == "2012-09-19" & Ehen$Fish == "C"]
# W = 0.9414, p-value = 0.2551 ######
set.seed(25)
Ehen <- data.frame(DATE= sample(seq(as.Date('2012-09-19'), length.out=10,
by='1 day'), 412, replace=TRUE), Fish= sample(c("C", "S"), 412,
replace=TRUE), Length=sample(c(NA,60:80), 412,replace=TRUE))