我正在尝试使用dcast将核苷酸频率从长格式转换为宽格式,如下所示:
res <- read.table(text='seqnames pos strand nucleotide count which_label V3 REF
1 134199222 - A NA 1:134199222-134199222 ENSMUST00000086465 T
1 134199222 - A NA 1:134199222-134199222 ENSMUST00000169927 T
1 134199222 - A NA 1:134199222-134199222 ENSMUST00000038191 T
1 134199222 - A NA 1:134199222-134199222 ENSMUST00000086465 T
1 134199222 - A NA 1:134199222-134199222 ENSMUST00000169927 T
1 134199222 - A NA 1:134199222-134199222 ENSMUST00000038191 T',header=T)
> res
seqnames pos strand nucleotide count which_label V3 REF
1 134199222 - A NA 1:134199222-134199222 ENSMUST00000086465 TRUE
1 134199222 - A NA 1:134199222-134199222 ENSMUST00000169927 TRUE
1 134199222 - A NA 1:134199222-134199222 ENSMUST00000038191 TRUE
1 134199222 - A NA 1:134199222-134199222 ENSMUST00000086465 TRUE
1 134199222 - A NA 1:134199222-134199222 ENSMUST00000169927 TRUE
1 134199222 - A NA 1:134199222-134199222 ENSMUST00000038191 TRUE
# change the levels so that even if there is no information, we get an output
res$strand <- factor(res$strand,levels=c('-','+'))
res$nucleotide <- factor(res$nucleotide,levels=c('A','T','G','C'))
res$seqnames <- factor(res$seqnames, levels=unique(res$seqnames))
# convert NAs to 0
# do not drop any missing rows
# get results for all possible nucleotide and strand even if absent
results <- dcast(res, seqnames+pos+V3~nucleotide+strand,
value.var = "count", fill = 0, drop=FALSE)
*Aggregation function missing: defaulting to length*
# results object looks like this
seqnames pos V3 A_- A_+ T_- T_+ G_- G_+ C_- C_+
1 134199222 ENSMUST00000038191 2 0 0 0 0 0 0 0
1 134199222 ENSMUST00000086465 2 0 0 0 0 0 0 0
1 134199222 ENSMUST00000169927 2 0 0 0 0 0 0 0
正如您所见,默认情况下dcast计算长度并在A_-中输出2,而我想要0,因为数据帧中有NA。我期待这样的事情:
seqnames pos V3 A_- A_+ T_- T_+ G_- G_+ C_- C_+
1 134199222 ENSMUST00000038191 0 0 0 0 0 0 0 0
1 134199222 ENSMUST00000086465 0 0 0 0 0 0 0 0
1 134199222 ENSMUST00000169927 0 0 0 0 0 0 0 0
即使我使用value.var = "count"
,为什么它仍然按长度聚合?任何帮助将不胜感激!
谢谢!