我有这样的df:
> dat
gen M1 M1 M1 M1 M2 M2 M2
G1 150 142 130 105 96
G2 150 145 142 130 96 89
G3 150 145 130 105 96
G4 145 142 130 105 89
G5 150 142 130 105 96
G6 145 142 130 96 89
G7 150 142 105 96
G8 150 145 130 105 89
G9 150 145 142 96 89
此处,数据存在于重复的ID中。我喜欢这样说:
>dat1
gen M1 M1 M1 M1 agg M2 M2 M2 agg
G1 150 142 130 150/142/130 105 96 105/96
G2 150 145 142 130 150/145/142/130 96 89 96/89
G3 150 145 130 150/145/130 105 96 105/96
G4 145 142 130 145/142/430 105 89 105/89
G5 150 142 130 150/142/130 105 96 105/96
G6 145 142 130 145/142/130 96 89 96/89
G7 150 142 150/142 105 96 105/96
G8 150 145 130 150/145/130 105 89 105/89
G9 150 145 142 150/145/142 96 89 96/89
这里,在agg列中,我根据重复的第一行聚合了所有值。
我喜欢在重复列的末尾创建新列并将其聚合
如何在R中做到这一点我非常困惑
EDIT:
dput(dat)
structure(list(V1 = structure(c(10L, 1L, 2L, 3L, 4L, 5L, 6L,
7L, 8L, 9L), .Label = c("G1", "G2", "G3", "G4", "G5", "G6", "G7",
"G8", "G9", "gen"), class = "factor"), V2 = structure(c(2L, 1L,
1L, 1L, NA, 1L, NA, 1L, 1L, 1L), .Label = c("150", "M1"), class = "factor"),
V3 = structure(c(2L, NA, 1L, 1L, 1L, NA, 1L, NA, 1L, 1L), .Label = c("145",
"M1"), class = "factor"), V4 = structure(c(2L, 1L, 1L, NA,
1L, 1L, 1L, 1L, NA, 1L), .Label = c("142", "M1"), class = "factor"),
V5 = structure(c(2L, 1L, 1L, 1L, 1L, 1L, 1L, NA, 1L, NA), .Label = c("130",
"M1"), class = "factor"), V6 = structure(c(2L, 1L, NA, 1L,
1L, 1L, NA, 1L, 1L, NA), .Label = c("105", "M2"), class = "factor"),
V7 = structure(c(2L, 1L, 1L, 1L, NA, 1L, 1L, 1L, NA, 1L), .Label = c("96",
"M2"), class = "factor"), V8 = structure(c(2L, NA, 1L, NA,
1L, NA, 1L, NA, 1L, 1L), .Label = c("89", "M2"), class = "factor")), .Names = c("V1",
"V2", "V3", "V4", "V5", "V6", "V7", "V8"), class = "data.frame", row.names = c(NA,
-10L))
答案 0 :(得分:0)
将它们聚合成您使用paste()
的字符向量 x=data.frame(x1=1:10,x2=1:10,x1=11:20)
#now notice that r created my x object with three columns x1,x2 and x1.1
xnew=cbind(x,agg=paste(x$x1,x$x2,x$x1.1,sep="/"))
我不确定这是否是您想要做的,因为我对您的数据结构有点困惑。
答案 1 :(得分:0)
如果缺失值为空白,则此方法有效:
dat$agg1 <- apply(dat[,2:5],1,function(x)paste(x[nchar(x)>0],collapse="/"))
dat$agg2 <- apply(dat[,6:8],1,function(x)paste(x[nchar(x)>0],collapse="/"))
dat <- dat[,c(1:5,9,6:8,10)]
dat
# gen M1 M1.1 M1.2 M1.3 agg1 M2 M2.1 M2.2 agg2
# 1 G1 150 142 130 150/142/130 105 96 105/96
# 2 G2 150 145 142 130 150/145/142/130 96 89 96/89
# 3 G3 150 145 130 150/145/130 105 96 105/96
# 4 G4 145 142 130 145/142/130 105 89 105/89
# ...
如果缺失值为NA
dat$agg1 <- apply(dat[,2:5],1,function(x)paste(x[!is.na(x)],collapse="/"))
dat$agg2 <- apply(dat[,6:8],1,function(x)paste(x[!is.na(x)],collapse="/"))
答案 2 :(得分:0)
这是我的剧本...我知道你们中的一些人可以简单而优雅! 我转换了我的df(一个简单的例子),并以表格形式阅读。
> dat<-read.table("dat.txt", header=T, sep="\t", na.strings="")
> dat
gen A B C D
1 M1 1 NA 3 NA
2 M1 NA 6 NA 3
3 M1 4 8 NA NA
4 M1 NA NA 6 3
5 M2 8 NA 6 NA
6 M2 NA 2 NA 6
7 M3 3 8 NA 2
8 M3 8 9 5 NA
9 M4 3 7 8 5
10 M4 5 NA 3 2
> final<-NULL
> for(i in 1:4){
+ mar<-as.character(dat[1,1])
+ dat1<-dat[dat[,1]%in% c(mar),]
+ dat <- dat[!dat[,1]%in% c(mar),]
+ dat2 <- apply(dat1,2,function(x)paste(x[!is.na(x)],collapse="/"))
+ dat2$gen<-mar
+ dat3<-rbind(dat1,dat2)
+ final<-rbind(final, dat3)
+ }
Warning messages:
1: In dat2$gen <- mar : Coercing LHS to a list
2: In dat2$gen <- mar : Coercing LHS to a list
3: In dat2$gen <- mar : Coercing LHS to a list
4: In dat2$gen <- mar : Coercing LHS to a list
> final
gen A B C D
1 M1 1 <NA> 3 <NA>
2 M1 <NA> 6 <NA> 3
3 M1 4 8 <NA> <NA>
4 M1 <NA> <NA> 6 3
5 M1 1/ 4 6/ 8 3/ 6 3/ 3
51 M2 8 <NA> 6 <NA>
6 M2 <NA> 2 <NA> 6
31 M2 8 2 6 6
7 M3 3 8 <NA> 2
8 M3 8 9 5 <NA>
32 M3 3/8 8/9 5 2
9 M4 3 7 8 5
10 M4 5 <NA> 3 2
33 M4 3/5 7 8/3 5/2