我对使用不同数据结构但数据相同的成对样本t检验中t的符号有疑问。我知道该标志在意义上并没有什么区别,但是通常会告诉用户该标志是随时间推移减少还是随时间增加。因此,我需要确保我提供的代码产生的结果相同,或者正确解释了。
作为例子,我们要向软件用户说明结果(和代码),该软件使用R(C#程序中的Rdotnet)进行统计。我不清楚R中两种方法中变量的正确顺序。
方法1 使用两个矩阵
## Sets seed for repetitive number generation
set.seed(2820)
## Creates the matrices
preTest <- c(rnorm(100, mean = 145, sd = 9))
postTest <- c(rnorm(100, mean = 138, sd = 8))
## Runs paired-sample T-Test just on two original matrices
t.test(preTest,postTest, paired = TRUE)
结果显示出显着性,并且带有正t值,告诉我从preTest到PostTest的均值差已经减小。
Paired t-test data: preTest and postTest t = 7.1776, df = 99, p-value = 1.322e-10 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: 6.340533 11.185513 sample estimates: mean of the differences 8.763023
但是,大多数人不是从两个矩阵而是从具有BEFORE和AFTER值的文件中获取数据。我将这些数据保存在csv中,并在演示过程中将其导入。因此,为了模仿这一点,我需要在我们的软件用户习惯查看的结构中创建数据框。导入csv后,“ pstt”应该看起来像我拥有的数据框。
方法2:使用平面文件结构
## Converts the matrices into a dataframe that looks like the way these
data are normally stored in a csv or Excel
ID <- c(1:100)
pstt <- data.frame(ID,preTest,postTest)
## Puts the data in a form that can be used by R (grouping var | data var)
pstt2 <- data.frame(
group = rep(c("preTest","postTest"),each = 100),
weight = c(preTest, postTest)
)
## Runs paired-sample T-Test on the newly structured data frame
t.test(weight ~ group, data = pstt2, paired = TRUE)
此t检验的结果为t阴性,这可能会向用户表明所研究的变量随时间增加。
Paired t-test data: weight by group t = -7.1776, df = 99, p-value = 1.322e-10 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -11.185513 -6.340533 sample estimates: mean of the differences -8.763023
有没有一种方法可以明确定义哪个组在BEFORE之前,哪个在AFTER之后?或者,您必须在方法2中首先拥有AFTER组。
感谢您的帮助/解释。
这是我使用的完整的R程序:
## sets working dir
# setwd("C:\\Temp\\")
## runs file from command line
# source("paired_ttest.r",echo=TRUE)
## Sets seed for repetitive number generation
set.seed(2820)
## Creates the matrices
preTest <- c(rnorm(100, mean = 145, sd = 9))
postTest <- c(rnorm(100, mean = 138, sd = 8))
ID <- c(1:100)
## Converts the matrices into a dataframe that looks like the way these
data are normally stored
pstt <- data.frame(ID,preTest,postTest)
## Puts the data in a form that can be used by R (grouping var | data var)
pstt2 <- data.frame(
group = rep(c("preTest","postTest"),each = 100),
weight = c(preTest, postTest)
)
print(pstt2)
## Runs paired-sample T-Test just on two original matrices
# t.test(preTest,postTest, paired = TRUE)
## Runs paired-sample T-Test on the newly structured data frame
t.test(weight ~ group, data = pstt2, paired = TRUE)
答案 0 :(得分:2)
由于group
是一个因素,因此t.test将使用该因素的第一个级别作为参考级别。默认情况下,因子级别按字母顺序排序为“ AFTER”将在“ BEFORE”之前,而“ postTest”将在“ preTest”之前。您可以使用relevel()
明确设置因子的参考水平。
t.test(weight ~ relevel(group, "preTest"), data = pstt2, paired = TRUE)