如何在随机数据样本的子集上调用函数

时间:2019-07-03 01:08:00

标签: r

我正在尝试对特定数据子集执行t.test。假设我有116羽鸟类的数据集,并想从“种类”类别中随机抽取35羽鸟类(非唯一)。然后,我想找到这些随机物种的“身体质量”平均值。然后,我想在此样本上调用t.test作为整个数据的代表。

我首先将数据存储在对象“ bird”中。我尝试使用sample(bird $ Species,35)抽取随机样本,这产生了35种随机鸟类。现在我似乎无法进一步对该随机样本进行子集化以找到Body.Mass的平均值。我尝试使用tidyverse进行子集设置,但这是我知道解决此类问题的唯一方法。

library(dplyr)
bird = read.csv("NZBIRDS.csv")
dput(head(bird))
set.seed(20)
sambird = sample(bird$Species,35)
sambird

bmbird <- sambird %>% summarize(avg = mean(Body.Mass))
bmbird
structure(list(Species = c("Grebes", "Grebes", "Petrels", "Petrels", 
"Petrels", "Petrels"), Name = c("P. cristatus", "P. rufopectus", 
"P. gavia", "P. assimilis", "P. urinatrix", "P. georgicus"), 
Extinct = c("No", "No", "Yes", "Yes", "Yes", "No"), Habitat = c("A", 
"A", "A", "A", "A", "A"), Nest.Site = c("G", "G", "GC", "GC", 
"GC", "GC"), Nest.Density = c("L", "L", "H", "H", "H", "H"
), Diet = c("F", "F", "F", "F", "F", "F"), Flight = c("Yes", 
"Yes", "Yes", "Yes", "Yes", "Yes"), Body.Mass = c(1100L, 
250L, 300L, 200L, 130L, 120L), Egg.Length = c(57, 43, 57, 
54, 38, 39)), .Names = c("Species", "Name", "Extinct", "Habitat", 
"Nest.Site", "Nest.Density", "Diet", "Flight", "Body.Mass", "Egg.Length"
), row.names = c(NA, 6L), class = "data.frame")

UseMethod(“ summarise_”)中的错误:没有适用于“ summarise_”的适用方法应用于“ factor”类的对象

1 个答案:

答案 0 :(得分:0)

不清楚是要从数据中的唯一物种列表中进行采样还是要对行进行采样,以使每种“ Species”类型都可以在数据中多次出现。如果要从独特的物种中取样,可以执行以下操作:

# Only sampling one species since the example data
#   contains only two, should work fine
#   for more random species
random_species = sample(unique(bird$Species), 1, replace = FALSE)

bird %>%
    filter(Species %in% random_species) %>%
    group_by(Species) %>%
    summarize(avg = mean(Body.Mass))