在R中具有重复ID的列中聚合数据

时间:2014-03-21 16:25:59

标签: r duplicates aggregate


我有这样的df:

> dat
    gen M1  M1  M1  M1  M2  M2  M2
    G1  150     142 130 105 96  
    G2  150 145 142 130     96  89
    G3  150 145     130 105 96  
    G4      145 142 130 105     89
    G5  150     142 130 105 96  
    G6      145 142 130     96  89
    G7  150     142     105 96  
    G8  150 145     130 105     89
    G9  150 145 142         96  89

此处,数据存在于重复的ID中。我喜欢这样说:

>dat1
gen M1  M1  M1  M1  agg M2  M2  M2  agg
G1  150     142 130 150/142/130 105 96      105/96
G2  150 145 142 130 150/145/142/130     96  89  96/89
G3  150 145     130 150/145/130 105 96      105/96
G4      145 142 130 145/142/430 105     89  105/89
G5  150     142 130 150/142/130 105 96      105/96
G6      145 142 130 145/142/130     96  89  96/89
G7  150     142     150/142 105 96      105/96
G8  150 145     130 150/145/130 105     89  105/89
G9  150 145 142     150/145/142     96  89  96/89

这里,在agg列中,我根据重复的第一行聚合了所有值。
我喜欢在重复列的末尾创建新列并将其聚合 如何在R中做到这一点我非常困惑

EDIT:
dput(dat)
    structure(list(V1 = structure(c(10L, 1L, 2L, 3L, 4L, 5L, 6L, 
    7L, 8L, 9L), .Label = c("G1", "G2", "G3", "G4", "G5", "G6", "G7", 
    "G8", "G9", "gen"), class = "factor"), V2 = structure(c(2L, 1L, 
    1L, 1L, NA, 1L, NA, 1L, 1L, 1L), .Label = c("150", "M1"), class = "factor"), 
        V3 = structure(c(2L, NA, 1L, 1L, 1L, NA, 1L, NA, 1L, 1L), .Label = c("145", 
        "M1"), class = "factor"), V4 = structure(c(2L, 1L, 1L, NA, 
        1L, 1L, 1L, 1L, NA, 1L), .Label = c("142", "M1"), class = "factor"), 
        V5 = structure(c(2L, 1L, 1L, 1L, 1L, 1L, 1L, NA, 1L, NA), .Label = c("130", 
        "M1"), class = "factor"), V6 = structure(c(2L, 1L, NA, 1L, 
        1L, 1L, NA, 1L, 1L, NA), .Label = c("105", "M2"), class = "factor"), 
        V7 = structure(c(2L, 1L, 1L, 1L, NA, 1L, 1L, 1L, NA, 1L), .Label = c("96", 
        "M2"), class = "factor"), V8 = structure(c(2L, NA, 1L, NA, 
        1L, NA, 1L, NA, 1L, 1L), .Label = c("89", "M2"), class = "factor")), .Names = c("V1", 
    "V2", "V3", "V4", "V5", "V6", "V7", "V8"), class = "data.frame", row.names = c(NA, 
    -10L))

3 个答案:

答案 0 :(得分:0)

将它们聚合成您使用paste()

的字符向量
 x=data.frame(x1=1:10,x2=1:10,x1=11:20)

 #now notice that r created my x object with three columns x1,x2 and x1.1

 xnew=cbind(x,agg=paste(x$x1,x$x2,x$x1.1,sep="/"))

我不确定这是否是您想要做的,因为我对您的数据结构有点困惑。

答案 1 :(得分:0)

如果缺失值为空白,则此方法有效:

dat$agg1 <- apply(dat[,2:5],1,function(x)paste(x[nchar(x)>0],collapse="/"))
dat$agg2 <- apply(dat[,6:8],1,function(x)paste(x[nchar(x)>0],collapse="/"))

dat <- dat[,c(1:5,9,6:8,10)]
dat
#   gen  M1 M1.1 M1.2 M1.3            agg1  M2 M2.1 M2.2   agg2
# 1  G1 150       142  130     150/142/130 105   96      105/96
# 2  G2 150  145  142  130 150/145/142/130       96   89  96/89
# 3  G3 150  145       130     150/145/130 105   96      105/96
# 4  G4      145  142  130     145/142/130 105        89 105/89
# ...

如果缺失值为NA

,则此方法有效
dat$agg1 <- apply(dat[,2:5],1,function(x)paste(x[!is.na(x)],collapse="/"))
dat$agg2 <- apply(dat[,6:8],1,function(x)paste(x[!is.na(x)],collapse="/"))

答案 2 :(得分:0)

这是我的剧本...我知道你们中的一些人可以简单而优雅! 我转换了我的df(一个简单的例子),并以表格形式阅读。

 > dat<-read.table("dat.txt", header=T, sep="\t", na.strings="")
    > dat
       gen  A  B  C  D
    1   M1  1 NA  3 NA
    2   M1 NA  6 NA  3
    3   M1  4  8 NA NA
    4   M1 NA NA  6  3
    5   M2  8 NA  6 NA
    6   M2 NA  2 NA  6
    7   M3  3  8 NA  2
    8   M3  8  9  5 NA
    9   M4  3  7  8  5
    10  M4  5 NA  3  2
    > final<-NULL
    > for(i in 1:4){
    +   mar<-as.character(dat[1,1])
    +   dat1<-dat[dat[,1]%in% c(mar),]
    +   dat <- dat[!dat[,1]%in% c(mar),]
    +   dat2 <- apply(dat1,2,function(x)paste(x[!is.na(x)],collapse="/"))
    +   dat2$gen<-mar
    +   dat3<-rbind(dat1,dat2)
    +   final<-rbind(final, dat3)
    + }
    Warning messages:
    1: In dat2$gen <- mar : Coercing LHS to a list
    2: In dat2$gen <- mar : Coercing LHS to a list
    3: In dat2$gen <- mar : Coercing LHS to a list
    4: In dat2$gen <- mar : Coercing LHS to a list
    > final
       gen     A     B     C     D
    1   M1     1  <NA>     3  <NA>
    2   M1  <NA>     6  <NA>     3
    3   M1     4     8  <NA>  <NA>
    4   M1  <NA>  <NA>     6     3
    5   M1  1/ 4  6/ 8  3/ 6  3/ 3
    51  M2     8  <NA>     6  <NA>
    6   M2  <NA>     2  <NA>     6
    31  M2     8     2     6     6
    7   M3     3     8  <NA>     2
    8   M3     8     9     5  <NA>
    32  M3   3/8   8/9     5     2
    9   M4     3     7     8     5
    10  M4     5  <NA>     3     2
    33  M4   3/5     7   8/3   5/2