我的起始表如下:
CHROM POS REF ALT GT
1: 1 58211 A G 1/1
2: 1 6464767 CAAATAAATAAATAAATAAATAAAT C,CAAATAAATAAATAAATAAATAAATAAATAAAT 1/2
3: 12 83011 T C 0/1
4: 18 1541042 C T,A 1/2
我想申请一个功能" ap2"这会将第2行的长REF和ALT条目分成两个较短的条目,更新第2行的数据(更改REF,ALT和GT)并插入新行(#3使用新的POS,ALT和GT)。结果将如下所示:
CHROM POS REF ALT GT
1: 1 58211 A G 1/1
2: 1 6464767 CAAATAAATAAATAAATAAATAAAT C 1/2
3: 1 6464791 T TAAATAAAT 1/2
4: 12 83011 T C 0/1
5: 18 1541042 C T,A 1/2
如果我运行ap2功能,它会显示预期的结果(列V1-V4):
tmp[,ap2(POS,REF,ALT,GT), by=c("CHROM","POS","REF","ALT","GT")]
CHROM POS REF ALT GT V1 V2 V3 V4
1: 1 6464767 CAAATAAATAAATAAATAAATAAAT C,CAAATAAATAAATAAATAAATAAATAAATAAAT 1/2 6464767 CAAATAAATAAATAAATAAATAAAT C 0/1
2: 1 6464767 CAAATAAATAAATAAATAAATAAAT C,CAAATAAATAAATAAATAAATAAATAAATAAAT 1/2 6464791 T TAAATAAAT 0/1
3: 18 1541042 C T,A 1/2 1541042 C T,A 1/2
但是,如果我尝试更新原始列,我会收到错误:
tmp[, c("POS","REF","ALT","GT") := ap2(POS,REF,ALT,GT), by=c("CHROM","POS","REF","ALT","GT")]
Warning messages:
1: In `[.data.table`(tmp, , `:=`(c("POS", "REF", "ALT", "GT"), ap2(POS, :
RHS 1 is length 2 (greater than the size (1) of group 2). The last 1 element(s) will be discarded.
2: In `[.data.table`(tmp, , `:=`(c("POS", "REF", "ALT", "GT"), ap2(POS, :
RHS 2 is length 2 (greater than the size (1) of group 2). The last 1 element(s) will be discarded.
3: In `[.data.table`(tmp, , `:=`(c("POS", "REF", "ALT", "GT"), ap2(POS, :
RHS 3 is length 2 (greater than the size (1) of group 2). The last 1 element(s) will be discarded.
4: In `[.data.table`(tmp, , `:=`(c("POS", "REF", "ALT", "GT"), ap2(POS, :
RHS 4 is length 2 (greater than the size (1) of group 2). The last 1 element(s) will be discarded.
以下是创建我的tmp data.table
的代码data.table(
CHROM = as.character(c("1","1","12","18")) ,
POS = as.integer(c(58211,6464767,83011,1541042)) ,
REF = c("A","CAAATAAATAAATAAATAAATAAAT","T","C") ,
ALT = c("G","C,CAAATAAATAAATAAATAAATAAATAAATAAAT","C","T,A") ,
GT = c("1/1","1/2","0/1","1/2")
)
这是我试图应用的功能:
ap2 <- function(pos,ref,alt,gt) {
if(gt=="1/2") {
alt.split <- unlist(strsplit(alt,","))
matching <- attr(regexpr(ref,alt.split), "match.length")
if(max(matching) == -1) {
list(pos,ref,alt,gt)
} else {
alt.new <- NULL
ref.new <- NULL
pos.new <- NULL
gt.new <- NULL
for(i in 1:length(matching)) {
stopPos <- matching[i]
if(stopPos == -1) {
pos.new <- c(pos.new,as.integer(pos))
ref.new <- c(ref.new,ref)
alt.new <- c(alt.new,alt.split[i])
} else {
pos.new <- c(pos.new, as.integer(pos+matching[i]-1))
ref.new <- c(ref.new, substring(ref,stopPos))
alt.new <- c(alt.new, substring(alt.split[i],stopPos))
}
gt.new <- c(gt.new, "0/1")
}
list(pos.new, ref.new, alt.new, gt.new)
}
}
}