R中的配对样本t检验:方向问题

时间:2018-12-04 19:44:48

标签: r statistics

我对使用不同数据结构但数据相同的成对样本t检验中t的符号有疑问。我知道该标志在意义上并没有什么区别,但是通常会告诉用户该标志是随时间推移减少还是随时间增加。因此,我需要确保我提供的代码产生的结果相同,或者正确解释了。

作为例子,我们要向软件用户说明结果(和代码),该软件使用R(C#程序中的Rdotnet)进行统计。我不清楚R中两种方法中变量的正确顺序。

方法1 使用两个矩阵

## Sets seed for repetitive number generation
set.seed(2820)

## Creates the matrices
preTest <- c(rnorm(100, mean = 145, sd = 9))
postTest <- c(rnorm(100, mean = 138, sd = 8))

## Runs paired-sample T-Test just on two original matrices
t.test(preTest,postTest, paired = TRUE)

结果显示出显着性,并且带有正t值,告诉我从preTest到PostTest的均值差已经减小。

    Paired t-test

data:  preTest and postTest
t = 7.1776, df = 99, p-value = 1.322e-10
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
  6.340533 11.185513
sample estimates:
mean of the differences 
               8.763023

但是,大多数人不是从两个矩阵而是从具有BEFORE和AFTER值的文件中获取数据。我将这些数据保存在csv中,并在演示过程中将其导入。因此,为了模仿这一点,我需要在我们的软件用户习惯查看的结构中创建数据框。导入csv后,“ pstt”应该看起来像我拥有的​​数据框。

方法2:使用平面文件结构

## Converts the matrices into a dataframe that looks like the way these 
data are normally stored in a csv or Excel

ID <- c(1:100)
pstt <- data.frame(ID,preTest,postTest)

## Puts the data in a form that can be used by R (grouping var | data var)
pstt2 <- data.frame(
                group = rep(c("preTest","postTest"),each = 100),
                weight = c(preTest, postTest)
                )

## Runs paired-sample T-Test on the newly structured data frame
t.test(weight ~ group, data = pstt2, paired = TRUE)

此t检验的结果为t阴性,这可能会向用户表明所研究的变量随时间增加。

    Paired t-test

data:  weight by group 
t = -7.1776, df = 99, p-value = 1.322e-10
alternative hypothesis: true difference in means is not equal to 0 
95 percent confidence interval:
 -11.185513  -6.340533 
sample estimates:
mean of the differences 
              -8.763023

有没有一种方法可以明确定义哪个组在BEFORE之前,哪个在AFTER之后?或者,您必须在方法2中首先拥有AFTER组。

感谢您的帮助/解释。

这是我使用的完整的R程序:

## sets working dir
#  setwd("C:\\Temp\\")

## runs file from command line
#  source("paired_ttest.r",echo=TRUE)

## Sets seed for repetitive number generation
set.seed(2820)

## Creates the matrices
preTest <- c(rnorm(100, mean = 145, sd = 9))
postTest <- c(rnorm(100, mean = 138, sd = 8))
ID <- c(1:100)

## Converts the matrices into a dataframe that looks like the way these 
   data are normally stored
pstt <- data.frame(ID,preTest,postTest)

## Puts the data in a form that can be used by R (grouping var | data var)
pstt2 <- data.frame(
                group = rep(c("preTest","postTest"),each = 100),
                weight = c(preTest, postTest)
                )

print(pstt2)                

## Runs paired-sample T-Test just on two original matrices
#  t.test(preTest,postTest, paired = TRUE)

## Runs paired-sample T-Test on the newly structured data frame
t.test(weight ~ group, data = pstt2, paired = TRUE)

1 个答案:

答案 0 :(得分:2)

由于group是一个因素,因此t.test将使用该因素的第一个级别作为参考级别。默认情况下,因子级别按字母顺序排序为“ AFTER”将在“ BEFORE”之前,而“ postTest”将在“ preTest”之前。您可以使用relevel()明确设置因子的参考水平。

t.test(weight ~ relevel(group, "preTest"), data = pstt2, paired = TRUE)