我正在尝试对变量'Size'运行Shapiro Wilks测试,使用我使用ddply进行子集化的数据集(通过变量'Site'和'Category'),但我不断收到错误消息。
以下是我的数据集(d)的示例。我有937个观察点,有9个类别和13个站点:
Site Genus Size Category
Arn01 ACR 4 ACR
Arn01 ACR 7 ACR
Arn02 ACR 3 ACR
我为Shapiro Wilks创建了一个函数:
shap.w <- function(input){ #shapiro wilk test function
if(sum(!is.na(input$Size)) > 3 & sum(!is.na(input$Size)) < 5000){
p <- shapiro.test(input$Size)$p.value
return(p)}else{return(NA)} }
然后,我尝试使用ddply:
将该函数应用于我的数据子集sw_test <- ddply(d, .(Site, Category), .fun = shap.w)
但是当我这样做时,我收到一条错误消息:
Error in shapiro.test(input$Size) : all 'x' values are identical
即使他们显然不是。任何帮助/建议将不胜感激。
的ETA输出
dput(d[1:20,]):
> dput(d[1:20,])
structure(list(Site = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("Arn01n",
"Arn02n", "Arn03n", "Arn04n", "Arn05n", "Arn06n", "Arn07n", "Arn08n",
"Arn09n", "Arn10n", "Arn11n", "Arn12n", "Arn13n"), class = "factor"),
Genus = structure(c(2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 30L, 30L, 30L, 30L), .Label = c("ACA",
"ACR", "AST", "COS", "CYP", "ECH", "FUN", "FVA", "FVT", "GAR",
"GON", "HEL", "HYD", "ISO", "LEA", "LEO", "LEP", "LOB", "MER",
"MNT", "MST", "MYC", "PAV", "PBR", "PLA", "PLAT", "POC",
"POD", "PRE", "PRM", "PRS", "PSA", "SAR", "STY"), class = "factor"),
Size = c(4, 2, 4, 4, 3, 5, 5, 4, 4, 4, 4, 3, 6, 3, 4, 5,
2, 3, 3, 6), Category = structure(c(1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 8L, 8L, 8L, 8L), .Label = c("ACR",
"FAV", "FUN", "HEL", "ISO", "MNT", "POC", "PRM", "PRS"), class = "factor")),
.Names = c("Site",
"Genus", "Size", "Category"), row.names = c(NA, 20L), class = "data.frame")`
table(d$Size)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 22 23 24 25 26 27 28 29 30 31 33 35 36 37 38 39
14 271 525 548 521 424 201 206 50 357 23 95 36 7 171 11 14 30 4 145 11 21 5 46 4 1 5 1 95 1 2 31 3 1 2 1
40 41 42 43 44 45 46 48 50 51 53 55 56 57 60 62 63 65 66 70 72 75 76 80 82 83 85 88 90 94 95 100 105 110 120 125
80 1 9 3 4 22 1 4 42 1 1 4 1 3 64 3 5 9 4 13 1 2 1 20 2 2 2 1 5 1 2 17 1 2 6 2
128 130 143 150 155 160 180 200 230 300 890 920
1 1 1 1 1 1 1 2 1 1 1 1
答案 0 :(得分:1)
请注意,如果您返回NA
,则is.numeric
会提供FALSE
:请尝试is.numeric(NA)
查看此内容。
您可以返回NA_real_
而不是
is.numeric(NA)
[1] FALSE
is.numeric(NA_real_)
[1] TRUE
它仍然是NA:
is.na(NA_real_)
[1] TRUE
但是,as.numeric
也应该解决这个问题(也许会仔细检查输入时函数返回到ddply的内容)
答案 1 :(得分:0)
好的,感谢我在评论中收到的帮助,我能够通过更新函数代码来解决这个问题:
shap.w <- function(input){ #shapiro-wilks test function
if(length(unique((input$Size[!is.na(input)]))) > 3
& length(unique((input$Size[!is.na(input)])))< 5000 ){
p <- shapiro.test(input$Size)$p.value
return(p)}else{return(NA)} }
这将删除小于3 /大于5000的组合(尽管此数据集中不会有任何大于5,000的组合)。一旦我更新了这个,下一行就没有任何问题了。谢谢大家的帮助!