我有data.frame
看起来像:
mi pos L08.92.s1 L08.92.s2 LD09.911.s1 LD09.911.s2 Storn.s1 Storn.s2 Storn.s3 Storn.s4 Tre
1 snp1 12713760 CC CT CC CC TT TT TT TT CC
2 snp2 8219379 AA AA -- AA AA AA AA AA --
3 snp3 6595215 GG GG GG GG GG -- GG GT GT
4 snp4 42348146 CC CC CC CC CC CA -- CA AA
5 snp5 1809563 GG GG TT TG GG GG GG GG TT
6 snp6 34285723 TT CC -- -- TT TT TT TT CC
7 snp7 21533194 AA AA AG -- AA GG GG GG AG
我希望最终的数据框看起来像:
mi pos L08.92 LD09.911 Storn Tre
1 snp1 12713760 CC CC TT CC
2 snp2 8219379 AA AA AA --
3 snp3 6595215 GG GG GG GT
4 snp4 42348146 CC CC CC AA
5 snp5 1809563 GG TT GG TT
6 snp6 34285723 HH -- TT CC
7 snp7 21533194 AA AG HH AG
程序:对于每个样品,具有复制的柱将熔化为一列。该值将从复制中获取,如下所示:
谢谢你的帮助!
答案 0 :(得分:4)
您可以尝试
indx <- gsub("[.][^.]+$", "", colnames(df)[-(1:2)])
lst <- split(colnames(df)[-(1:2)], indx)
Un <- c('AA', 'CC', 'GG', 'TT')
df2 <- df[,1:2]
df2[unique(indx)] <- lapply(lst, function(x)
apply(df[x], 1, function(y) {y1 <- unique(y)
y2 <- y1[y1 %in% Un]
ifelse(length(y2)==0, sort(y1, decreasing=TRUE),
ifelse(length(y2)==2, 'HH', y2))
}))
df2
# mi pos L08.92 LD09.911 Storn Tre
#1 snp1 12713760 CC CC TT CC
#2 snp2 8219379 AA AA AA --
#3 snp3 6595215 GG GG GG GT
#4 snp4 42348146 CC CC CC AA
#5 snp5 1809563 GG TT GG TT
#6 snp6 34285723 HH -- TT CC
#7 snp7 21533194 AA AG HH AG
df <- structure(list(mi = c("snp1", "snp2", "snp3", "snp4", "snp5",
"snp6", "snp7"), pos = c(12713760L, 8219379L, 6595215L, 42348146L,
1809563L, 34285723L, 21533194L), L08.92.s1 = c("CC", "AA", "GG",
"CC", "GG", "TT", "AA"), L08.92.s2 = c("CT", "AA", "GG", "CC",
"GG", "CC", "AA"), LD09.911.s1 = c("CC", "--", "GG", "CC", "TT",
"--", "AG"), LD09.911.s2 = c("CC", "AA", "GG", "CC", "TG", "--",
"--"), Storn.s1 = c("TT", "AA", "GG", "CC", "GG", "TT", "AA"),
Storn.s2 = c("TT", "AA", "--", "CA", "GG", "TT", "GG"), Storn.s3 = c("TT",
"AA", "GG", "--", "GG", "TT", "GG"), Storn.s4 = c("TT", "AA",
"GT", "CA", "GG", "TT", "GG"), Tre = c("CC", "--", "GT",
"AA", "TT", "CC", "AG")), .Names = c("mi", "pos", "L08.92.s1",
"L08.92.s2", "LD09.911.s1", "LD09.911.s2", "Storn.s1", "Storn.s2",
"Storn.s3", "Storn.s4", "Tre"), class = "data.frame", row.names = c(NA,
-7L))