使用autoplot()+ scale_color_manual()函数在图例中缺少级别

时间:2016-09-16 15:24:36

标签: r plot ggplot2 legend bioconductor

我正在尝试绘制三个不同的床文件(一个包含SNP数据,另一个包含删除,第三个包含重复数据),但我无法管理图例以包含三个图层的值,除非我放入数据完全在同一个文件中。将三个文件合并为一个文件的问题是我无法为变量的每个级别设置ylim。

这是我的一个输入文件(包含SNP信息的文件)的示例:

chr10    47000019    47000019    rs150696937    2    +
chr11    1017064    1017064    NA    2    +
chr11    1017280    1017280    rs199539548    2    +
chr11    1017294    1017294    NA    2    +
chr11    1017756    1017756    NA    2    +
chr13    31898038    31898038    rs200460848    2    +
chr13    40298639    40298639    NA    2    +
chr13    48996928    48996928    rs530812916    2    +
chr13    50204777    50204777    rs117251022    2    +
chr14    20216005    20216005    rs566685404    2    +
chr14    20404076    20404076    rs114526346    2    +
chr21    10944668    10944668    rs138088406    2    +

我正在使用得分列来指定我想以下列方式绘制的变体类型:“1”=删除; “2”= SNP和“3”=重复。

这些是我正在使用的库:

## Load libraries and required databases
library(ggbio)
data(hg19IdeogramCyto, package = "biovizBase")

library(GenomicRanges)
hg19 <- keepSeqlevels(hg19IdeogramCyto, paste0("chr", c(1:22, "X", "Y")))

biovizBase::isIdeogram(hg19)

data("hg19IdeogramCyto", package = "biovizBase")

data("hg19Ideogram", package = "biovizBase")

我使用本网站提供的Bed2GRanges功能:http://davetang.org/muse/2015/02/04/bed-granges/将我的床文件转换为GRanges对象。

# Required Bed2GRanges function

# BED to GRanges
#
# This function loads a BED-like file and stores it as a GRanges object.
# The tab-delimited file must be ordered as 'chr', 'start', 'end', 'id', 'score', 'strand'.
# The minimal BED file must have the 'chr', 'start', 'end' columns.
# Any columns after the strand column are ignored.
#
# @param file Location of your file
# @keywords BED GRanges
# @export
# @examples
# bed_to_granges('my_bed_file.bed')

bed_to_granges <- function(file){
        df <- read.table(file,
                         header=F,
                         stringsAsFactors=F)

        if(length(df) > 6){
                df <- df[,-c(7:length(df))]
        }

        if(length(df)<3){
                stop("File has less than 3 columns")
        }

        header <- c('chr','start','end','id','score','strand')
        names(df) <- header[1:length(names(df))]

        if('strand' %in% colnames(df)){
                df$strand <- gsub(pattern="[^+-]+", replacement = '*', x = df$strand)
        }

        library("GenomicRanges")

        if(length(df)==3){
                gr <- with(df, GRanges(chr, IRanges(start, end)))
        } else if (length(df)==4){
                gr <- with(df, GRanges(chr, IRanges(start, end), id=id))
        } else if (length(df)==5){
                gr <- with(df, GRanges(chr, IRanges(start, end), id=id, score=as.character(score)))
        } else if (length(df)==6){
                gr <- with(df, GRanges(chr, IRanges(start, end), id=id, score=as.character(score), strand=strand))
        }
        return(gr)
}

我导入我的床文件:

## Import bed files as GRanges file
SNP <- bed_to_granges("SNPs.bed")
seqlengths(SNP) <- seqlengths(hg19Ideogram)[names(seqlengths(SNP))]
SNP_dn <- keepSeqlevels(SNP, paste0("chr", c(1:22, "X", "Y")))

我绘制数据:

#Plotting SNP_dn according to score column
test <- autoplot(SNP_dn, aes(color = score)) +
        scale_color_manual("Variant type",
                           values = score <- c("black", "red", "blue"),
                           breaks = c("2","1","3"),
                           drop = FALSE,
                           labels = c("SNP", "Deletion", "Duplication")) +
        theme(legend.position = "right")

test

即使我指定选项drop = FALSE,我仍然会错过图例中的“删除”和“复制”级别。

我几天来一直在努力解决这个问题,但我无法弄清楚如何解决它。

我想有一个包含我用scale_color_manual()函数指定的三个级别的图例(即“SNP”,“删除”,“复制”),即使它们中没有任何一个在床上文件。

R version 3.3.1 (2016-06-21)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252  
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                         
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] biovizBase_1.20.0    ggbio_1.20.2         GenomicRanges_1.24.3 GenomeInfoDb_1.8.7   IRanges_2.6.1      
[6] S4Vectors_0.10.3     ggplot2_2.1.0        BiocGenerics_0.18.0 

loaded via a namespace (and not attached):
[1] Rcpp_0.12.7                   lattice_0.20-34               Rsamtools_1.24.0            
[4] Biostrings_2.40.2             digest_0.6.10                 mime_0.5                    
[7] R6_2.1.3                      plyr_1.8.4                    chron_2.3-47                
[10] acepack_1.3-3.3               RSQLite_1.0.0                 BiocInstaller_1.22.3        
[13] httr_1.2.1                    zlibbioc_1.18.0               GenomicFeatures_1.24.5      
[16] data.table_1.9.6              rpart_4.1-10                  Matrix_1.2-7.1              
[19] labeling_0.3                  splines_3.3.1                 BiocParallel_1.6.6          
[22] AnnotationHub_2.4.2           stringr_1.1.0                 foreign_0.8-67              
[25] RCurl_1.95-4.8                biomaRt_2.28.0                munsell_0.4.3               
[28] shiny_0.14                    httpuv_1.3.3                  rtracklayer_1.32.2          
[31] htmltools_0.3.5               nnet_7.3-12                   SummarizedExperiment_1.2.3  
[34] gridExtra_2.2.1               interactiveDisplayBase_1.10.3 Hmisc_3.17-4                
[37] XML_3.98-1.4                  reshape_0.8.5                 GenomicAlignments_1.8.4     
[40] bitops_1.0-6                  RBGL_1.48.1                   grid_3.3.1                  
[43] xtable_1.8-2                  GGally_1.2.0                  gtable_0.2.0                
[46] DBI_0.5-1                     magrittr_1.5                  scales_0.4.0                
[49] graph_1.50.0                  stringi_1.1.1                 XVector_0.12.1              
[52] reshape2_1.4.1                latticeExtra_0.6-28           Formula_1.2-1               
[55] RColorBrewer_1.1-2            ensembldb_1.4.7               tools_3.3.1                 
[58] dichromat_2.0-0               OrganismDbi_1.14.1            BSgenome_1.40.1             
[61] Biobase_2.32.0                survival_2.39-5               AnnotationDbi_1.34.4        
[64] colorspace_1.2-6              cluster_2.0.4                 VariantAnnotation_1.18.7     

非常感谢,

最佳,

1 个答案:

答案 0 :(得分:0)

一个选项是确保您的因子包含您想要绘制的所有级别。这将使drop = FALSE生效。

您可以通过factorlevels参数执行此操作。例如,如果我想将级别5添加到mtcars :: cyl:

mtcars$cyl = factor(mtcars$cyl, levels = c("4", "5", "6", "8"))

另一种选择是将breaks替换为limits中的scale_color_manual。这种方法不依赖于数据中的实际因子水平(因此drop = FALSE没有做任何事情)。

scale_color_manual("Variant type",
                values = c("black", "red", "blue"),
                limits = c("2","1","3"),
                labels = c("SNP", "Deletion", "Duplication"))