I have a 94 varibles(sample+proteins+group) and 172 observations in a matrix as:
Sample Protein1 Protein2 ... Protein92 Group
1 1.53 3.325 ... 5.63 0
2 2.32 3.451 ... 6.32 0
.
.
.
103 3.24 4.21 ... 3.53 0
104 3.44 5.22 ... 6.78 1
.
.
.
192 6.75 4.34 ... 6.15 1
Some of the sample are in group 0 and some are in group 1. I want to test if there is a differences between group 0 and 1 using a t-test and I want to do it for all the proteins. I was thinking of using an apply, but I am not sure how to use it. Also the names are not Protein1, protein2... , it is much longer so I would not want to have to write them all.
I also would only like the p-value for each protein in a matrix, something like this:
Protein p-value
Protein1 0.00563
Protein2 0.0640
.
.
Protein92 0.610
Or something similar so that I after can find just the ones with a p-value lower than 0.05/92.
答案 0 :(得分:6)
Try something like:
sapply(df[,2:93], function(i) t.test(i ~ df$Group)$p.value)
will return an array of p.value.
You could store this as a data.frame
and look for low p-values by doing this:
x <- data.frame(p.value= sapply(df[,2:93], function(i) t.test(i ~ df$Group)$p.value))
x$protein_name <- rownames(x) # edit: new column for protein_name
rownames(x) <- NULL # edit: new column for protein_name
x[x$p.value < 0.05/92,]
Note that the names of the array elements and the row names of the data frame keep the Protein1, Protein2 etc. edit: I added a column for protein name per OP intent and deleted it from rownames so it wouldn't appear twice at print()
P.S. Glad to see you are adjusting p-value for multiple comparisons.