Question

a.2<-sample(1:10,100,replace=T)
b.2<-sample(1:100,100,replace=T)
a.3<-data.frame(a.2,b.2)

r<-sapply(split(a.3,a.2),function(x) which.max(x$b.2))

a.3[r,]

返回列表索引，而不是整个data.frame的索引

我试图为b.2的每个子组返回a.2的最大值。我怎样才能有效地做到这一点？

Answer 1

我认为ddply和ave方法都是资源密集型的。 ave由于当前问题耗尽内存而失败（67,608行，其中四列定义了唯一键）。 tapply是一个方便的选择，但我通常需要做的是为每个唯一键（通常由多个列定义）选择具有某事物值的所有整行。我发现的最佳解决方案是进行排序，然后使用duplicated的否定来仅为每个唯一键选择第一行。对于这里的简单示例：

a <- sample(1:10,100,replace=T)
b <- sample(1:100,100,replace=T)
f <- data.frame(a, b)

sorted <- f[order(f$a, -f$b),]
highs <- sorted[!duplicated(sorted$a),]

我认为至少在ave或ddply之上的性能提升是巨大的。对于多列密钥来说稍微复杂一点，但是order将处理一大堆要排序的事情并且duplicated处理数据帧，因此可以继续使用这种方法。

Answer 2

library(plyr)
ddply(a.3, "a.2", subset, b.2 == max(b.2))

Answer 3

a.2<-sample(1:10,100,replace=T)
b.2<-sample(1:100,100,replace=T)
a.3<-data.frame(a.2,b.2)

Jonathan Chang的答案会告诉你你明确要求的内容，但我猜你想要数据框中的实际行。

sel <- ave(b.2, a.2, FUN = max) == b.2
a.3[sel,]

Answer 4

a.2<-sample(1:10,100,replace=T)
b.2<-sample(1:100,100,replace=T)
a.3<-data.frame(a.2,b.2)
m<-split(a.3,a.2)
u<-function(x){
    a<-rownames(x)
    b<-which.max(x[,2])
    as.numeric(a[b])
    }
r<-sapply(m,FUN=function(x) u(x))

a.3[r,]

这就是诀窍，虽然有点麻烦......但是它允许我抓住分组最大值的行。还有其他想法吗？

Answer 5

> a.2<-sample(1:10,100,replace=T)
> b.2<-sample(1:100,100,replace=T)
> tapply(b.2, a.2, max)
 1  2  3  4  5  6  7  8  9 10 
99 92 96 97 98 99 94 98 98 96

Answer 6

a.2<-sample(1:10,100,replace=T)
b.2<-sample(1:100,100,replace=T)
a.3<-data.frame(a.2,b.2)

使用aggregate，您可以在一行中获得每个组的最大值：

aggregate(a.3, by = list(a.3$a.2), FUN = max)

这会产生以下输出：

   Group.1 a.2 b.2
1        1   1  96
2        2   2  82
...
8        8   8  85
9        9   9  93
10      10  10  97

选择r中组中具有最大变量值的行

6 个答案: