我有这个数据框(mydf
)。我需要将列REF和ALT中的字母(DNA字母)与字母(“A”,“T”,“G”,“C”)相匹配,并将相应的数值粘贴在一起作为“REF,ALT”。但是,有一些行我在TYPE列中有“snp:+ [0-9]”和“flat $”。现在对于“flat $”行,我想将ALT值与相应“start”id的“snp:+ [0-9]”相加,并将ALT值再次粘贴为“REF,ALT”(REF值)对于具有相同起始ID的“snp:+ [0-9]”和“flat $”都是相同的,并获得结果中显示的输出。我该怎么做才能做到这一点呢?
mydf<-structure(c("chr20:5363934", "chr5:8529759", "chr14:9620689",
"chr18:547375", "chr8:5952145", "chr14:8694382", "chr16:2530921",
"chr16:2530921", "chr16:2530921", "chr14:4214117", "chr4:7799768",
"chr3:9141263", "95", "24", "65", "94", "27", "68", "49", "49",
"49", "73", "36", "27", "29", " 1", "49", " 1", "80", "94", "15",
"15", "15", "49", "28", "41", "14", "28", "41", "51", "25", "26",
"79", "79", "79", "18", " 1", "93", "59", "41", "96", "67", "96",
"30", "72", "72", "72", "77", "16", "90", "C", "G", "T", "G",
"T", "A", "A", "A", "A", "G", "C", "A", "T", "C", "G", "C", "T",
"A", "T", "G", "T", "A", "A", "A", "snp", "snp", "snp", "snp",
"snp", "snp", "snp:2530921", "snp:2530921", "snp:flat", "snp", "snp", "snp"), .Dim = c(12L,
8L), .Dimnames = list(NULL, c("start", "A", "T", "G", "C", "REF",
"ALT", "TYPE")))
结果
start A T G C REF ALT TYPE AD
[1,] "chr20:5363934" "95" "29" "14" "59" "C" "T" "snp" "59,29"
[2,] "chr5:8529759" "24" " 1" "28" "41" "G" "C" "snp" "28,41"
[3,] "chr14:9620689" "65" "49" "41" "96" "T" "G" "snp" "49,41"
[4,] "chr18:547375" "94" " 1" "51" "67" "G" "C" "snp" "51,67"
[5,] "chr8:5952145" "27" "80" "25" "96" "T" "T" "snp" "80,80"
[6,] "chr14:8694382" "68" "94" "26" "30" "A" "A" "snp" "68,68"
[7,] "chr16:2530921" "49" "15" "79" "72" "A" "T" "snp:2530921" "49,15"
[8,] "chr16:2530921" "49" "15" "79" "72" "A" "G" "snp:2530921" "49,79"
[9,] "chr16:2530921" "49" "15" "79" "72" "A" "T" "snp:flat" "49,94"
[10,] "chr14:4214117" "73" "49" "18" "77" "G" "A" "snp" "18,73"
[11,] "chr4:7799768" "36" "28" " 1" "16" "C" "A" "snp" "16,36"
[12,] "chr3:9141263" "27" "41" "93" "90" "A" "A" "snp" "27,27"
答案 0 :(得分:3)
indx <- sapply(mydf[,c("REF", "ALT")], function(x) match(x, colnames(mydf)))
flat <- grepl("flat", mydf[,"TYPE"])
x <- `dim<-`(mydf[cbind(rep(1:nrow(mydf), 2), indx)], c(nrow(mydf), 2))
add_ids <- mydf[,"start"][mydf[,"start"] %in% mydf[,"start"][flat] & !flat]
toadd <- x[,2][mydf[,"start"] %in% mydf[,"start"][flat] & !flat]
x[,2][flat] <-tapply(as.numeric(toadd), factor(add_ids, levels=unique(add_ids)), sum)
cbind(mydf, paste(x[,1], x[,2],sep=","))
# start A T G C REF ALT TYPE
# [1,] "chr20:5363934" "95" "29" "14" "59" "C" "T" "snp" "59,29"
# [2,] "chr5:8529759" "24" " 1" "28" "41" "G" "C" "snp" "28,41"
# [3,] "chr14:9620689" "65" "49" "41" "96" "T" "G" "snp" "49,41"
# [4,] "chr18:547375" "94" " 1" "51" "67" "G" "C" "snp" "51,67"
# [5,] "chr8:5952145" "27" "80" "25" "96" "T" "T" "snp" "80,80"
# [6,] "chr14:8694382" "68" "94" "26" "30" "A" "A" "snp" "68,68"
# [7,] "chr16:2530921" "49" "15" "79" "72" "A" "T" "snp:2530921" "49,15"
# [8,] "chr16:2530921" "49" "15" "79" "72" "A" "G" "snp:2530921" "49,79"
# [9,] "chr16:2530921" "49" "15" "79" "72" "A" "T" "snp:flat" "49,94"
# [10,] "chr14:4214117" "73" "49" "18" "77" "G" "A" "snp" "18,73"
# [11,] "chr4:7799768" "36" "28" " 1" "16" "C" "A" "snp" "16,36"
# [12,] "chr3:9141263" "27" "41" "93" "90" "A" "A" "snp" "27,27"
我们首先创建一个与REF和ALT匹配的索引到正确的列。创建逻辑索引,定位具有“平坦”的列。在他们中。创建具有所有匹配的数字向量并给定维度。
将ids的值与&#39; flat&#39;相加。作为TYPE,我们首先确定与id匹配的行和值本身。然后将它们分配到适当的列槽,并将所有内容绑定在一起。