我有一个数据框架,其类别为水果,成熟度和均值。 如何创建一个运行ttest的for循环,以确定每个水果成熟度的均值差?换句话说,对于苹果,ttest会得出成熟和未成熟苹果之间均值差异的结果。 如下表所示。
答案 0 :(得分:2)
当您遍历数据中出现的唯一“水果”时,类似这样的方法可能会返回比较“成熟度”的t检验的p值。
## create a vector of the unique fruit in the data; vector of fruit to be tested
fruit<-unique(data$Fruits)
## iterate through your list of unique fruit, testing as you go
for(i in 1:length(fruit)){
## subset your data to include only the current fruit to be tested
df<-filter(data, Fruits==fruit[i])
## let the user know which fruit is being tested
message(fruit[i])
## create a vector of the unique ripeness states of the current fruit to be tested
ripe<-unique(df$Ripeness)
## make sure two means exist; ensure there are both ripe and non-ripe values
if(length(ripe) < 2){
## if only one ripeness, let user know and skip to next unique fruit
message("only one ripeness")
next
}
## try testing the fruit and return p-value if success
tryCatch(
{
message(t.test(Mean ~ Ripeness, data = df)$p.value)
},
## if error in t-testing return message that there are "not enough observations"
error=function(cond) {
message("not enough observations")
}
)
}
我希望这会有所帮助!
答案 1 :(得分:1)
假设fruits
被编码为类别变量(即应为factor
),则可以使用sapply
来迭代每个水果的子集数据。在t.test
中,我们使用alternative="two.sided"
,只是为了强调它的默认设置。
但是,您的数据非常小,Bananas
才刚刚成熟。因此,我将使用更大的样本数据集进行演示。
res <- sapply(levels(dat$fruits), function(x)
t.test(mean ~ ripeness, dat[dat$fruits %in% x, ], alternative="two.sided")
)
res
# Apple Banana Orange
# statistic 0.948231 0.3432062 0.4421971
# parameter 23.38387 30.86684 16.47366
# p.value 0.3527092 0.7337699 0.664097
# conf.int Numeric,2 Numeric,2 Numeric,2
# estimate Numeric,2 Numeric,2 Numeric,2
# null.value 0 0 0
# stderr 0.8893453 1.16548 1.043739
# alternative "two.sided" "two.sided" "two.sided"
# method "Welch Two Sample t-test" "Welch Two Sample t-test" "Welch Two Sample t-test"
# data.name "mean by ripeness" "mean by ripeness" "mean by ripeness"
数据:
set.seed(42)
n <- 1e2
dat <- data.frame(fruits=factor(sample(1:3, n, replace=T),
labels=c("Apple", "Banana", "Orange")),
ripeness=factor(rbinom(n, 1, .4), labels=c("yes", "no")),
mean=round(runif(n)*10))
请注意,将来您应该包括一个最小的独立示例,其中应包含适当格式的数据(从不提供图像,请阅读here,以了解如何实现此目的),以及所有您到目前为止已尝试的步骤,因为Stack Overflow不提供编码服务。干杯!