我正在尝试绘制三个不同的床文件(一个包含SNP数据,另一个包含删除,第三个包含重复数据),但我无法管理图例以包含三个图层的值,除非我放入数据完全在同一个文件中。将三个文件合并为一个文件的问题是我无法为变量的每个级别设置ylim。
这是我的一个输入文件(包含SNP信息的文件)的示例:
chr10 47000019 47000019 rs150696937 2 +
chr11 1017064 1017064 NA 2 +
chr11 1017280 1017280 rs199539548 2 +
chr11 1017294 1017294 NA 2 +
chr11 1017756 1017756 NA 2 +
chr13 31898038 31898038 rs200460848 2 +
chr13 40298639 40298639 NA 2 +
chr13 48996928 48996928 rs530812916 2 +
chr13 50204777 50204777 rs117251022 2 +
chr14 20216005 20216005 rs566685404 2 +
chr14 20404076 20404076 rs114526346 2 +
chr21 10944668 10944668 rs138088406 2 +
我正在使用得分列来指定我想以下列方式绘制的变体类型:“1”=删除; “2”= SNP和“3”=重复。
这些是我正在使用的库:
## Load libraries and required databases
library(ggbio)
data(hg19IdeogramCyto, package = "biovizBase")
library(GenomicRanges)
hg19 <- keepSeqlevels(hg19IdeogramCyto, paste0("chr", c(1:22, "X", "Y")))
biovizBase::isIdeogram(hg19)
data("hg19IdeogramCyto", package = "biovizBase")
data("hg19Ideogram", package = "biovizBase")
我使用本网站提供的Bed2GRanges功能:http://davetang.org/muse/2015/02/04/bed-granges/将我的床文件转换为GRanges对象。
# Required Bed2GRanges function
# BED to GRanges
#
# This function loads a BED-like file and stores it as a GRanges object.
# The tab-delimited file must be ordered as 'chr', 'start', 'end', 'id', 'score', 'strand'.
# The minimal BED file must have the 'chr', 'start', 'end' columns.
# Any columns after the strand column are ignored.
#
# @param file Location of your file
# @keywords BED GRanges
# @export
# @examples
# bed_to_granges('my_bed_file.bed')
bed_to_granges <- function(file){
df <- read.table(file,
header=F,
stringsAsFactors=F)
if(length(df) > 6){
df <- df[,-c(7:length(df))]
}
if(length(df)<3){
stop("File has less than 3 columns")
}
header <- c('chr','start','end','id','score','strand')
names(df) <- header[1:length(names(df))]
if('strand' %in% colnames(df)){
df$strand <- gsub(pattern="[^+-]+", replacement = '*', x = df$strand)
}
library("GenomicRanges")
if(length(df)==3){
gr <- with(df, GRanges(chr, IRanges(start, end)))
} else if (length(df)==4){
gr <- with(df, GRanges(chr, IRanges(start, end), id=id))
} else if (length(df)==5){
gr <- with(df, GRanges(chr, IRanges(start, end), id=id, score=as.character(score)))
} else if (length(df)==6){
gr <- with(df, GRanges(chr, IRanges(start, end), id=id, score=as.character(score), strand=strand))
}
return(gr)
}
我导入我的床文件:
## Import bed files as GRanges file
SNP <- bed_to_granges("SNPs.bed")
seqlengths(SNP) <- seqlengths(hg19Ideogram)[names(seqlengths(SNP))]
SNP_dn <- keepSeqlevels(SNP, paste0("chr", c(1:22, "X", "Y")))
我绘制数据:
#Plotting SNP_dn according to score column
test <- autoplot(SNP_dn, aes(color = score)) +
scale_color_manual("Variant type",
values = score <- c("black", "red", "blue"),
breaks = c("2","1","3"),
drop = FALSE,
labels = c("SNP", "Deletion", "Duplication")) +
theme(legend.position = "right")
test
即使我指定选项drop = FALSE
,我仍然会错过图例中的“删除”和“复制”级别。
我几天来一直在努力解决这个问题,但我无法弄清楚如何解决它。
我想有一个包含我用scale_color_manual()函数指定的三个级别的图例(即“SNP”,“删除”,“复制”),即使它们中没有任何一个在床上文件。
R version 3.3.1 (2016-06-21)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] stats4 parallel stats graphics grDevices utils datasets methods base
other attached packages:
[1] biovizBase_1.20.0 ggbio_1.20.2 GenomicRanges_1.24.3 GenomeInfoDb_1.8.7 IRanges_2.6.1
[6] S4Vectors_0.10.3 ggplot2_2.1.0 BiocGenerics_0.18.0
loaded via a namespace (and not attached):
[1] Rcpp_0.12.7 lattice_0.20-34 Rsamtools_1.24.0
[4] Biostrings_2.40.2 digest_0.6.10 mime_0.5
[7] R6_2.1.3 plyr_1.8.4 chron_2.3-47
[10] acepack_1.3-3.3 RSQLite_1.0.0 BiocInstaller_1.22.3
[13] httr_1.2.1 zlibbioc_1.18.0 GenomicFeatures_1.24.5
[16] data.table_1.9.6 rpart_4.1-10 Matrix_1.2-7.1
[19] labeling_0.3 splines_3.3.1 BiocParallel_1.6.6
[22] AnnotationHub_2.4.2 stringr_1.1.0 foreign_0.8-67
[25] RCurl_1.95-4.8 biomaRt_2.28.0 munsell_0.4.3
[28] shiny_0.14 httpuv_1.3.3 rtracklayer_1.32.2
[31] htmltools_0.3.5 nnet_7.3-12 SummarizedExperiment_1.2.3
[34] gridExtra_2.2.1 interactiveDisplayBase_1.10.3 Hmisc_3.17-4
[37] XML_3.98-1.4 reshape_0.8.5 GenomicAlignments_1.8.4
[40] bitops_1.0-6 RBGL_1.48.1 grid_3.3.1
[43] xtable_1.8-2 GGally_1.2.0 gtable_0.2.0
[46] DBI_0.5-1 magrittr_1.5 scales_0.4.0
[49] graph_1.50.0 stringi_1.1.1 XVector_0.12.1
[52] reshape2_1.4.1 latticeExtra_0.6-28 Formula_1.2-1
[55] RColorBrewer_1.1-2 ensembldb_1.4.7 tools_3.3.1
[58] dichromat_2.0-0 OrganismDbi_1.14.1 BSgenome_1.40.1
[61] Biobase_2.32.0 survival_2.39-5 AnnotationDbi_1.34.4
[64] colorspace_1.2-6 cluster_2.0.4 VariantAnnotation_1.18.7
非常感谢,
最佳,
答案 0 :(得分:0)
一个选项是确保您的因子包含您想要绘制的所有级别。这将使drop = FALSE
生效。
您可以通过factor
和levels
参数执行此操作。例如,如果我想将级别5
添加到mtcars :: cyl:
mtcars$cyl = factor(mtcars$cyl, levels = c("4", "5", "6", "8"))
另一种选择是将breaks
替换为limits
中的scale_color_manual
。这种方法不依赖于数据中的实际因子水平(因此drop = FALSE
没有做任何事情)。
scale_color_manual("Variant type",
values = c("black", "red", "blue"),
limits = c("2","1","3"),
labels = c("SNP", "Deletion", "Duplication"))