dx <- data.frame(CMPD = c("cmpd1","cmpd1","cmpd1","cmpd1","cmpd2","cmpd2",
"cmpd2","cmpd2","cmpd3","cmpd3","cmpd3","cmpd3"),
MRM = c("309.0/121.1","309.0/121.1","309.0/90.1",
"309.0/90.1","305.2/140.3","305.2/140.3","300.5/107.3",
"300.5/107.3","404.8/126.0","404.8/126.0","401.5/91.0",
"401.5/91.0"),
RESP = c(123.4,234.5,345.6,456.7,567.8,678.9,789.0,12.4,
23.5,34.6,45.7,56.8))
-
>dx
CMPD MRM RESP
1 cmpd1 309.0/121.1 123.4
2 cmpd1 309.0/121.1 234.5
3 cmpd1 309.0/90.1 345.6
4 cmpd1 309.0/90.1 456.7
5 cmpd2 305.2/140.3 567.8
6 cmpd2 305.2/140.3 678.9
7 cmpd2 300.5/107.3 789.0
8 cmpd2 300.5/107.3 12.4
9 cmpd3 404.8/126.0 23.5
10 cmpd3 404.8/126.0 34.6
11 cmpd3 401.5/91.0 45.7
12 cmpd3 401.5/91.0 56.8
我希望能够根据CMPD
和MRM
组合的唯一性来处理这些数据(例如,行1,2,然后是行3,4等)
答案 0 :(得分:5)
让我向您介绍我的朋友,包plyr
。
该软件包可以轻松使用分割,应用和组合数据的通用策略。其中一个最有用的函数是ddply
,它将数据帧作为输入,并将数据帧减少为输出。您可以指定要拆分的唯一组合,以及要应用的功能,ddply
执行其余操作。
了解plyr
的好地方是Hadley's website或his article in the Journal of Statistical Software。 StackOverflow上有关于plyr的数百个答案。只需关注plyr - 代码或ddply - 代码。
以下是一些例子:
library(plyr)
提取均值:
> ddply(dx, .(CMPD, MRM), numcolwise(mean))
CMPD MRM RESP
1 cmpd1 309.0/121.1 178.95
2 cmpd1 309.0/90.1 401.15
3 cmpd2 300.5/107.3 400.70
4 cmpd2 305.2/140.3 623.35
5 cmpd3 401.5/91.0 51.25
6 cmpd3 404.8/126.0 29.05
或者总和:
> ddply(dx, .(CMPD, MRM), numcolwise(sum))
CMPD MRM RESP
1 cmpd1 309.0/121.1 357.9
2 cmpd1 309.0/90.1 802.3
3 cmpd2 300.5/107.3 801.4
4 cmpd2 305.2/140.3 1246.7
5 cmpd3 401.5/91.0 102.5
6 cmpd3 404.8/126.0 58.1
答案 1 :(得分:2)
如果您想处理数据框的整个子集,常见的事情是使用ddply
包中的plyr
:
ddply(dx, .(CMPD, MRM), .fun = doStuff)
备选方案有ave
或by
和aggregate
。对于计算比率的具体示例,使用summarise
可以提供很多帮助:
ddply(dx, .(CMPD, MRM), .fun = summarise, ratio = RESP[1]/RESP[2])
这种类型的任务在R世界中通常被称为“split-apply-combine”。
答案 2 :(得分:2)
您可以使用by
功能
by(dx$RESP, list(CMPD = dx$CMPD, MRM = dx$MRM), mean)
它返回一个by
对象,这个对象不一定容易“使用”,但它是可能的。