Question

我有两个数据框，一个带有患者样本的协变量，一个带有样本的甲基化数据。我需要进行t检验以按性别比较甲基化数据。

我的数据框看起来像这样- 协变量：

        "patient"   "sex"   "ethnicity"
sample1    p1         0      caucasian
sample2    p2         1      caucasian
sample3    p3         1      caucasian
sample4    p4         0      caucasian
sample5    p5         0      caucasian
sample6    p6         1      caucasian

并继续进行到示例46

甲基化：

       sample1  sample2 sample3 sample4 sample5 sample6 sample7 sample8 sample9 sample10
probe1  0.1111  0.2222  0.3333  0.4444  0.5555  0.6666  0.7777  0.8888  0.9999  1.111
probe2  0.1111  0.2222  0.3333  0.4444  0.5555  0.6666  0.7777  0.8888  0.9999  1.111
probe3  0.1111  0.2222  0.3333  0.4444  0.5555  0.6666  0.7777  0.8888  0.9999  1.111
probe4  0.1111  0.2222  0.3333  0.4444  0.5555  0.6666  0.7777  0.8888  0.9999  1.111

，依次选择80,000种不同的探针和46种不同的样品。因此，如果我想进行一系列的t检验，比较前8个样本的甲基化数据与性别，我是否可以指定：t.test(t(methylation[,1:8]) ~ covariates$sex)？还是有一种方法可以绑定样本名称（sample1，sample2 ...）？（对不起，我对R和统计资料都是陌生的）

Answer 1

一种简单的方法是创建单个data.frame methyl_cov_df，然后使用公式。

以下是通过probe1对前6个样本sex值进行t.test的示例（根据所需样本数进行适当更改）：

# combined data frame
methyl_cov_df <- cbind(t(methylation[,1:6]),covariates)

methyl_cov_df：

        probe1 probe2 probe3 probe4 patient sex ethnicity
sample1 0.1111 0.1111 0.1111 0.1111      p1   0 caucasian
sample2 0.2222 0.2222 0.2222 0.2222      p2   1 caucasian
sample3 0.3333 0.3333 0.3333 0.3333      p3   1 caucasian
sample4 0.4444 0.4444 0.4444 0.4444      p4   0 caucasian
sample5 0.5555 0.5555 0.5555 0.5555      p5   0 caucasian
sample6 0.6666 0.6666 0.6666 0.6666      p6   1 caucasian


# t.test by formula: slice the data.frame to use the number of samples: done for 6 below
t.test(formula = probe1~sex, data= methyl_cov_df[1:6,])

进行两次样本t检验

data:  probe1 by sex
t = -0.19612, df = 4, p-value = 0.8541
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
  -0.5613197  0.4872530
sample estimates:
  mean in group 0 mean in group 1 
0.3703333       0.4073667

数据：

covariates <- read.table(text = '        "patient"   "sex"   "ethnicity"
sample1    p1         0      caucasian
           sample2    p2         1      caucasian
           sample3    p3         1      caucasian
           sample4    p4         0      caucasian
           sample5    p5         0      caucasian
           sample6    p6         1      caucasian', header = T)

methylation <- read.table(text = "       sample1  sample2 sample3 sample4 sample5 sample6 sample7 sample8 sample9 sample10
probe1  0.1111  0.2222  0.3333  0.4444  0.5555  0.6666  0.7777  0.8888  0.9999  1.111
probe2  0.1111  0.2222  0.3333  0.4444  0.5555  0.6666  0.7777  0.8888  0.9999  1.111
probe3  0.1111  0.2222  0.3333  0.4444  0.5555  0.6666  0.7777  0.8888  0.9999  1.111
probe4  0.1111  0.2222  0.3333  0.4444  0.5555  0.6666  0.7777  0.8888  0.9999  1.111", header = T)

使用协变量在两个数据帧之间进行一系列t检验

1 个答案: