从data.frame自动提取p值

时间:2014-12-08 11:14:36

标签: r dataframe extraction p-value

我想比较两组患者的蛋白质表达值(n = 465种蛋白质)(耐药性与敏感性)。

我有11名耐药患者和8名敏感患者。我想比较(ttest)抗性组蛋白1(A resK res)的表达值与敏感组(L sensS sens)的表达值,蛋白质2(抗性)与蛋白质2(敏感),等等。作为输出,我只想要p值<0.05的蛋白质​​。

我试图这样做(见下文),但是出了点问题,我无法弄清楚是什么。

        X Protein.1 Protein.2 Protein.3 Protein.4 Protein.5 Protein.6
1   A res      4127     16886      1785      1636       407       135
2   B res     10039     32414      3144      1543       601       154
3   C res       527      1059      1637       317       229       107
4   D res       553      3848      7357      1168      1549       441
5   E res      2351      2272      5868      2606       517       159
6   F res       822      1767      2110       818       293        75
7   G res       673      1887       511       471       214        NA
8   H res      5769      2206      2041       517       355       298
9   I res      1660      4221      1921       629       383       104
10  J res      3281      1804      2400       225       268        52
11  K res      3383      1882      1935       185        NA        NA
12 L sens     10810     20136      2350      1143       527       160
13 M sens      5941     14873      3550       943       308        NA
14 N sens      1100      2325      1359       561       542       284
15 O sens        85       587       619       364        85        52
16 P sens      2321      6335      6494       994        NA        NA
17 Q sens    103810      7102      7986      1464       439       187
18 R sens      1174      2076      1423       340       186        70
19 S sens      1829       973      1343       380       453       221

data <- read.csv("ProteinDataResSens.csv", sep=";", na.strings="weak", header=TRUE)

res <- data.frame(data[1:11, ], row.names=NULL)
colnames(res) <- paste("res", 1:length(res), sep="_")
sens <- data.frame(data[12:19, ], row.names=NULL)
colnames(sens) <- paste("sens", 1:length(sens), sep="_")
com <- combn(c(colnames(res), colnames(sens)), 2)

p <- apply(com, 2, function(x) t.test(data[, x[1]], data[, x[2]])$p.val)
data.frame(comparison=paste(com[1, ], com[2, ],sep=" vs."), p.value=p)

非常感谢您的帮助!

1 个答案:

答案 0 :(得分:2)

如果您想将每个res列的sensProtein进行比较

grp <- sub(".* ", "", df$X)
Pvals <- mapply(function(x,y) t.test(x[grp=='res'], 
                x[grp=='sens'])$p.value, df[,-1], list(grp))

Pvals[Pvals < 0.05]

或使用data.table

library(data.table)
setDT(df)[, grp:= sub('.* ', "", X)][, lapply(.SD, 
      function(x) t.test(x[grp=='res'], x[grp=='sens'])$p.value),
                                              .SDcols=2:(ncol(df)-1)]

数据

 df <- structure(list(X = c("A res", "B res", "C res", "D res", "E res", 
 "F res", "G res", "H res", "I res", "J res", "K res", "L sens", 
 "M sens", "N sens", "O sens", "P sens", "Q sens", "R sens", "S sens"
 ), Protein.1 = c(4127L, 10039L, 527L, 553L, 2351L, 822L, 673L, 
 5769L, 1660L, 3281L, 3383L, 10810L, 5941L, 1100L, 85L, 2321L, 
 103810L, 1174L, 1829L), Protein.2 = c(16886L, 32414L, 1059L, 
 3848L, 2272L, 1767L, 1887L, 2206L, 4221L, 1804L, 1882L, 20136L, 
 14873L, 2325L, 587L, 6335L, 7102L, 2076L, 973L), Protein.3 = c(1785L, 
 3144L, 1637L, 7357L, 5868L, 2110L, 511L, 2041L, 1921L, 2400L, 
 1935L, 2350L, 3550L, 1359L, 619L, 6494L, 7986L, 1423L, 1343L), 
 Protein.4 = c(1636L, 1543L, 317L, 1168L, 2606L, 818L, 471L, 
 517L, 629L, 225L, 185L, 1143L, 943L, 561L, 364L, 994L, 1464L, 
 340L, 380L), Protein.5 = c(407L, 601L, 229L, 1549L, 517L, 
 293L, 214L, 355L, 383L, 268L, NA, 527L, 308L, 542L, 85L, 
 NA, 439L, 186L, 453L), Protein.6 = c(135L, 154L, 107L, 441L, 
 159L, 75L, NA, 298L, 104L, 52L, NA, 160L, NA, 284L, 52L, 
 NA, 187L, 70L, 221L)), .Names = c("X", "Protein.1", "Protein.2", 
 "Protein.3", "Protein.4", "Protein.5", "Protein.6"), class =
 "data.frame", row.names = c("1", "2", "3", "4", "5", "6", "7", "8",
 "9", "10", "11", "12", "13", "14", "15", "16", "17", "18", "19"))