Multiple T-test in R

时间:2015-06-30 13:38:26

标签: r hypothesis-test

I have a 94 varibles(sample+proteins+group) and 172 observations in a matrix as:

Sample   Protein1   Protein2 ... Protein92 Group
1          1.53      3.325   ...   5.63      0
2          2.32      3.451   ...   6.32      0
.
. 
.
103        3.24      4.21    ...   3.53      0               
104        3.44      5.22    ...   6.78      1
.
.
.
192        6.75      4.34    ...   6.15      1

Some of the sample are in group 0 and some are in group 1. I want to test if there is a differences between group 0 and 1 using a t-test and I want to do it for all the proteins. I was thinking of using an apply, but I am not sure how to use it. Also the names are not Protein1, protein2... , it is much longer so I would not want to have to write them all.

I also would only like the p-value for each protein in a matrix, something like this:

Protein  p-value
Protein1   0.00563
Protein2   0.0640
.
.
Protein92  0.610

Or something similar so that I after can find just the ones with a p-value lower than 0.05/92.

1 个答案:

答案 0 :(得分:6)

Try something like:

sapply(df[,2:93], function(i) t.test(i ~ df$Group)$p.value)

will return an array of p.value.

You could store this as a data.frame and look for low p-values by doing this:

x <- data.frame(p.value= sapply(df[,2:93], function(i) t.test(i ~ df$Group)$p.value))
x$protein_name <- rownames(x) # edit: new column for protein_name 
rownames(x) <- NULL           # edit: new column for protein_name
x[x$p.value < 0.05/92,]

Note that the names of the array elements and the row names of the data frame keep the Protein1, Protein2 etc. edit: I added a column for protein name per OP intent and deleted it from rownames so it wouldn't appear twice at print()

P.S. Glad to see you are adjusting p-value for multiple comparisons.