如何根据2列选择行?

时间:2011-08-24 15:58:50

标签: r

dx <- data.frame(CMPD = c("cmpd1","cmpd1","cmpd1","cmpd1","cmpd2","cmpd2",
                          "cmpd2","cmpd2","cmpd3","cmpd3","cmpd3","cmpd3"),
                 MRM = c("309.0/121.1","309.0/121.1","309.0/90.1",
                         "309.0/90.1","305.2/140.3","305.2/140.3","300.5/107.3",
                         "300.5/107.3","404.8/126.0","404.8/126.0","401.5/91.0",
                         "401.5/91.0"),
                 RESP = c(123.4,234.5,345.6,456.7,567.8,678.9,789.0,12.4,
                          23.5,34.6,45.7,56.8))

-

>dx

CMPD         MRM  RESP

1  cmpd1 309.0/121.1 123.4
2  cmpd1 309.0/121.1 234.5
3  cmpd1  309.0/90.1 345.6
4  cmpd1  309.0/90.1 456.7
5  cmpd2 305.2/140.3 567.8
6  cmpd2 305.2/140.3 678.9
7  cmpd2 300.5/107.3 789.0
8  cmpd2 300.5/107.3  12.4
9  cmpd3 404.8/126.0  23.5
10 cmpd3 404.8/126.0  34.6
11 cmpd3  401.5/91.0  45.7
12 cmpd3  401.5/91.0  56.8

我希望能够根据CMPDMRM组合的唯一性来处理这些数据(例如,行1,2,然后是行3,4等)

3 个答案:

答案 0 :(得分:5)

让我向您介绍我的朋友,包plyr

该软件包可以轻松使用分割,应用和组合数据的通用策略。其中一个最有用的函数是ddply,它将数据帧作为输入,并将数据帧减少为输出。您可以指定要拆分的唯一组合,以及要应用的功能,ddply执行其余操作。

了解plyr的好地方是Hadley's websitehis article in the Journal of Statistical Software。 StackOverflow上有关于plyr的数百个答案。只需关注 - 代码或 - 代码。

以下是一些例子:

library(plyr)

提取均值:

> ddply(dx, .(CMPD, MRM), numcolwise(mean))
   CMPD         MRM   RESP
1 cmpd1 309.0/121.1 178.95
2 cmpd1  309.0/90.1 401.15
3 cmpd2 300.5/107.3 400.70
4 cmpd2 305.2/140.3 623.35
5 cmpd3  401.5/91.0  51.25
6 cmpd3 404.8/126.0  29.05

或者总和:

> ddply(dx, .(CMPD, MRM), numcolwise(sum))
   CMPD         MRM   RESP
1 cmpd1 309.0/121.1  357.9
2 cmpd1  309.0/90.1  802.3
3 cmpd2 300.5/107.3  801.4
4 cmpd2 305.2/140.3 1246.7
5 cmpd3  401.5/91.0  102.5
6 cmpd3 404.8/126.0   58.1

答案 1 :(得分:2)

如果您想处理数据框的整个子集,常见的事情是使用ddply包中的plyr

ddply(dx, .(CMPD, MRM), .fun = doStuff)

备选方案有avebyaggregate。对于计算比率的具体示例,使用summarise可以提供很多帮助:

ddply(dx, .(CMPD, MRM), .fun = summarise, ratio = RESP[1]/RESP[2])

这种类型的任务在R世界中通常被称为“split-apply-combine”。

答案 2 :(得分:2)

您可以使用by功能

by(dx$RESP, list(CMPD = dx$CMPD, MRM = dx$MRM), mean)

它返回一个by对象,这个对象不一定容易“使用”,但它是可能的。