基于R中的唯一三元组聚合数据

时间:2014-07-24 05:51:25

标签: r aggregate permutation

我在这里被转介Counting existing permutations in R 对于之前的相关问题,但我无法将其应用于我的问题。这是我的数据

One <- c(rep("X",6),rep("Y",3),rep("Z",2))
Two <- c(rep("A",4),rep("B",6),rep("C",1))
Three <- c(rep("J",5),rep("K",2),rep("L",4))
Number <- runif(11)


df <- data.frame(One,Two,Three,Number)


   One Two Three     Number
1    X   A     J 0.10511669
2    X   A     J 0.62467760
3    X   A     J 0.24232663
4    X   A     J 0.38358854
5    X   B     J 0.04658226
6    X   B     K 0.26789844
7    Y   B     K 0.07685341
8    Y   B     L 0.21372276
9    Y   B     L 0.13620971
10   Z   B     L 0.49073692
11   Z   C     L 0.52968279

我试过

aggregate(df, df[,c(1:3)],FUN = c(length,mean))

接收

Error in match.fun(FUN) : 
'c(length, mean)' is not a function, character or symbol

我试图通过创建一个新的数据框来聚合,该数据框给出了每个唯一三元组(一,二,三)的频率,以及另一个包含每个唯一三元组的Number中值的列。因此,对于(X,A,J)三元组,我希望Count = 4和Median是Number下前四个数字的中位数。

3 个答案:

答案 0 :(得分:3)

您可以使用dplyr

 library(dplyr)
 res <- df%>%
 group_by(One,Two,Three) %>%
 summarize(length=n(), Mean=mean(Number)) #change `mean` to `median` if you want `median`

 str(res)
#Classes ‘grouped_df’, ‘tbl_df’, ‘tbl’ and 'data.frame':    7 obs. of  5 variables:
 ----------
  str(as.data.frame(res))
#'data.frame':  7 obs. of  5 variables:
# $ One   : Factor w/ 3 levels "X","Y","Z": 1 1 1 2 2 3 3
# $ Two   : Factor w/ 3 levels "A","B","C": 1 2 2 2 2 2 3
# $ Three : Factor w/ 3 levels "J","K","L": 1 1 2 2 3 3 3
# $ length: int  4 1 1 1 2 1 1
# $ Mean  : num  0.689 0.989 0.524 0.181 0.345 ...

library(data.table)
setDT(df)[,list(length=.N, Mean=mean(Number)),by=list(One,Two,Three)]
#      One Two Three length      Mean
# 1:   X   A     J      4 0.3660189
# 2:   X   B     J      1 0.8389641
# 3:   X   B     K      1 0.2815004
# 4:   Y   B     K      1 0.4990414
# 5:   Y   B     L      2 0.3814621
# 6:   Z   B     L      1 0.1144003
# 7:   Z   C     L      1 0.9508751

答案 1 :(得分:0)

OTT <- paste(One,Two,Three)
ott.mean <- tapply(Number,OTT,mean)
ott.count <- tapply(OTT,OTT,length)
cbind(ott.mean,ott.count)

答案 2 :(得分:0)

看起来很简单:

aggregate( df$Number, df[ , c(1:3)],
                    FUN = function(x) { c( len=length(x), mn=mean(x) ) } )

@latemail。不确定你的意思是什么?borked&#39; data.frame。第四个元素是矩阵。矩阵是数据帧的合法组成部分:

> d2[[4]]

     len        mn
[1,]   4 0.7531795
[2,]   1 0.8777003
[3,]   1 0.8003510
[4,]   1 0.6113566
[5,]   2 0.2470044
[6,]   1 0.3444656
[7,]   1 0.7517357

可以通常的方式访问矩阵:

> d2[ , 'x'][ , "mn"]
[1] 0.7531795 0.8777003 0.8003510 0.6113566 0.2470044 0.3444656 0.7517357