如何在x轴上同时显示染色体位置和SNP标签

时间:2016-09-09 17:57:05

标签: r plot ggplot2 bioconductor

我试图绘制一定数量的SNP,并且在x轴上都有染色体位置和它们的标记。

我已将床文件中的SNP信息导入GRanges对象。

我的床文件如下:

chr17    78191000    78191000    rsAAA    1    +
chr17    78191900    78191900    rsBBB    1    +
chr17    78194002    78194002    rsCCC    1    +
chr17    78197170    78197170    rsDDD    1    +

我用来将床文件转换为GRanges对象的函数是来自这个网站的函数:http://davetang.org/muse/2015/02/04/bed-granges/

bed_to_granges <- function(file){
        df <- read.table(file,
                         header=F,
                         stringsAsFactors=F)

        if(length(df) > 6){
                df <- df[,-c(7:length(df))]
        }

        if(length(df)<3){
                stop("File has less than 3 columns")
        }

        header <- c('chr','start','end','id','score','strand')
        names(df) <- header[1:length(names(df))]

        if('strand' %in% colnames(df)){
                df$strand <- gsub(pattern="[^+-]+", replacement = '*', x = df$strand)
        }

        library("GenomicRanges")

        if(length(df)==3){
                gr <- with(df, GRanges(chr, IRanges(start, end)))
        } else if (length(df)==4){
                gr <- with(df, GRanges(chr, IRanges(start, end), id=id))
        } else if (length(df)==5){
                gr <- with(df, GRanges(chr, IRanges(start, end), id=id, score=as.character(score)))
        } else if (length(df)==6){
                gr <- with(df, GRanges(chr, IRanges(start, end), id=id, score=as.character(score), strand=strand))
        }
        return(gr)
}

导入床文件并根据人类hg19构建重新格式化的代码如下:

library(ggbio)
data(hg19Ideogram, package = "biovizBase")
setwd(".../Test")

## Import bed file as GRanges file
SNP <- bed_to_granges("SNP_position.bed")
seqlengths(SNP) <- seqlengths(hg19Ideogram)[names(seqlengths(SNP))]
SNP_dn <- keepSeqlevels(SNP, paste0("chr", c(1:22, "X", "Y")))

我试图通过以下方式绘制SNP:

SNP_location <-  autoplot(SNP_dn) +
        theme(text = element_text(size=8),
              axis.text.x = element_text(angle=45, hjust=1)) +
        theme(legend.position="none") +        
        xlim(78190000,78200000) +
        scale_x_sequnit("Mb")
fixed(SNP_location) <- TRUE
SNP_location

此代码返回一个图,其中x轴的染色体位置和正确位置的SNP。

SNP_IDs <-  autoplot(SNP_dn) +
        scale_x_continuous(name = "\nSNP IDs",
                           breaks = as.vector(start(SNP_dn)),
                           labels = as.factor (SNP_dn$id)) +
        theme(text = element_text(size=8),
              axis.text.x = element_text(angle=45, hjust=1)) +
        theme(legend.position="none") +        
        xlim(78190000,78200000)
fixed(SNP_IDs) <- TRUE
SNP_IDs

此代码返回一个重新缩放的x轴,其中x轴刻度对应于SNP本身的位置和标签,但我松开了染色体参考。

我希望得到一个像第一个图像,x轴根据染色体位置缩放,第二条线位于同一图中包含SNP名称的任何位置。

我想将此图与其他图表结合使用ggbio track函数显示同一区域的其他特征,为了做到这一点,他们需要具有相同的染色体限制。

是否有一种简单的方法来标记SNP,保持原始x轴的染色体规模?

非常感谢,

最佳,

R version 3.3.1 (2016-06-21)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
 [1] grid      stats4    parallel  stats     graphics  grDevices utils     datasets  methods  
[10] base     

other attached packages:
 [1] Homo.sapiens_1.3.1                      TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.2
 [3] org.Hs.eg.db_3.3.0                      GO.db_3.3.0                            
 [5] OrganismDbi_1.14.1                      GenomicFeatures_1.24.5                 
 [7] AnnotationDbi_1.34.4                    Biobase_2.32.0                         
 [9] GenomicRanges_1.24.2                    GenomeInfoDb_1.8.3                     
[11] IRanges_2.6.1                           S4Vectors_0.10.3                       
[13] biovizBase_1.20.0                       ggbio_1.20.2                           
[15] ggplot2_2.1.0                           BiocGenerics_0.18.0                    

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.6                   lattice_0.20-33               Rsamtools_1.24.0             
 [4] Biostrings_2.40.2             digest_0.6.10                 mime_0.5                     
 [7] R6_2.1.3                      plyr_1.8.4                    chron_2.3-47                 
[10] acepack_1.3-3.3               RSQLite_1.0.0                 httr_1.2.1                   
[13] BiocInstaller_1.22.3          zlibbioc_1.18.0               data.table_1.9.6             
[16] rpart_4.1-10                  Matrix_1.2-6                  labeling_0.3                 
[19] splines_3.3.1                 BiocParallel_1.6.6            AnnotationHub_2.4.2          
[22] stringr_1.1.0                 foreign_0.8-66                RCurl_1.95-4.8               
[25] biomaRt_2.28.0                munsell_0.4.3                 shiny_0.13.2                 
[28] httpuv_1.3.3                  rtracklayer_1.32.2            htmltools_0.3.5              
[31] nnet_7.3-12                   SummarizedExperiment_1.2.3    gridExtra_2.2.1              
[34] interactiveDisplayBase_1.10.3 Hmisc_3.17-4                  XML_3.98-1.4                 
[37] reshape_0.8.5                 GenomicAlignments_1.8.4       bitops_1.0-6                 
[40] RBGL_1.48.1                   xtable_1.8-2                  GGally_1.2.0                 
[43] gtable_0.2.0                  DBI_0.5                       magrittr_1.5                 
[46] scales_0.4.0                  graph_1.50.0                  stringi_1.1.1                
[49] XVector_0.12.1                reshape2_1.4.1                latticeExtra_0.6-28          
[52] Formula_1.2-1                 RColorBrewer_1.1-2            ensembldb_1.4.7              
[55] tools_3.3.1                   dichromat_2.0-0               BSgenome_1.40.1              
[58] survival_2.39-5               colorspace_1.2-6              cluster_2.0.4                
[61] VariantAnnotation_1.18.7     

1 个答案:

答案 0 :(得分:1)

我想我找到了我正在寻找的参数:它是关于使用geom_text()函数的。您可以使用SNP的位置和SNP名称的chr向量生成int向量。之后添加+ geom_text(x = int_vector, y = rep(1.3,4), label = chr_vector, angle = 45, hjust = -0.4, vjust = 0.2, size = 3)就可以了。可能有更简单的方法,如果你分享它们我会很感激。