Question

totaldata$Age2 <- ifelse(totaldata$Age<=50, 0, 1)

t.test(totaldata$concernsubscorehiv, totaldata$Age2,alternative='two.sided',na.rm=TRUE, conf.level=.95, paired=FALSE

此代码显示以下结果： Welch双样本t检验

数据：

totaldata$concernsubscorehiv and totaldata$Age2
t = 33.19, df = 127.42, p-value < 2.2e-16
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 3.370758 3.798164
sample estimates:
mean of x mean of y 
 4.336842  0.752381

如您所见，y组的平均值为0.752381

然后我们用这个来估计每组的平均值：

aggregate(totaldata$concernsubscorehiv~totaldata$Age2,data=totaldata,mean)

这会产生

totaldata$Age2 totaldata$concernsubscorehiv 
1              0        4.354286             
2              1        4.330612

正如您所看到的，通过t检验估计组0的平均值是4.354286而不是0.752381。有什么问题？

Answer 1

您未正确使用t.test。 0.752381是age2为1的人的一部分。您正在提供正常数据的向量，以及零和1的向量，而您想要基于此分割第一个向量在第二个分组。

请考虑以下事项：

out <- rnorm(10)*5+100
bin <- rbinom(n=10, size=1, prob=0.5)

mean(out)
[1] 101.9462
mean(bin)
[1] 0.4

从?t.test帮助文件中，我们知道x和y是：

x（非空）数据值的数值向量。

是数据值的可选（非空）数字向量。

因此，通过同时提供out和bin，我将每个向量相互比较，这在这个例子中可能没有多大意义。参见：

t.test(out, bin)

    Welch Two Sample t-test

data:  out and bin
t = 86.665, df = 9.3564, p-value = 6.521e-15
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
  98.91092 104.18149
sample estimates:
mean of x mean of y 
 101.9462    0.4000

在这里，您可以看到t.test正确估算了我提供的两个向量的均值，如上所示。你想要做的是根据第二个是0还是1来分割第一个向量。

在我的玩具示例中，我可以通过写作来轻松完成：

t.test(out[which(bin==1)], out[which(bin==0)])

    Welch Two Sample t-test

data:  out[which(bin == 1)] and out[which(bin == 0)]
t = 0.34943, df = 5.1963, p-value = 0.7405
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -5.828182  7.686176
sample estimates:
mean of x mean of y 
 102.5036  101.5746

这里，这两个手段完全对应

tapply(out, bin, mean)
       0        1 
101.5746 102.5036

r中的t检验给出了平均值与总体函数的错误估计

1 个答案: