将LD矩阵图与SNP和基因数据结合使用ggplot / ggbio track()函数

时间:2017-01-24 22:59:17

标签: r plot ggplot2 bioconductor

基本上,我正在尝试做的是以与本网站类似的方式绘制某个基因组区域中的连锁不平衡(LD):https://analysistools.nci.nih.gov/LDlink/?tab=ldmatrix。问题是这个网站上的数字质量很低,我需要这个数字用于出版。我使用的所有示例文件都在帖子的末尾,以便您可以重现我的代码。

这是我到目前为止所做的:

library(ggplot2)
library(reshape)
library(scales)
library(ggbio)
library(GenomicRanges)
library(Homo.sapiens)
library(rtracklayer)
library(GenomicRanges)
library(BSgenome.Hsapiens.UCSC.hg19)
library(VariantAnnotation)
source("bed2granges.R")
data(hg19IdeogramCyto, package = "biovizBase")
data(hg19Ideogram, package = "biovizBase")
data(genesymbol, package = "biovizBase")
hg19 <- keepSeqlevels(hg19IdeogramCyto, paste0("chr", c(1:22, "X", "Y")))

我从LDlink网站下载了矩阵txt文件,将其用于我的LD图。我融合了数据框以获得评分颜色:

# FIRST PLOT - LD plot for CEU population

#Load r-squared values in Table format
CEU_plot<- read.table("r2_SOD1_matrix.txt", header = TRUE)

#Load chromosomal positions for each rs ID
RS_location <- read.table("SOD1_SNPs.bed", header = FALSE)

#Melt table to be able to make a gradient
CEU_plot_m <- melt(CEU_plot)

# x-axis values - chromosomal position
CEU_plot_m$RS_number <- factor(RS_location[,2], levels = RS_location[,2])

# y-axis - variable
CEU_plot_m$variable <- factor(CEU_plot_m$variable, levels = sort(CEU_plot_m$variable, decreasing = TRUE))

CEU = ggplot(CEU_plot_m, aes(x = RS_number, y = variable)) +
        labs (title = "r2 plot for SOD1 gene in CEU population ",
              x = element_blank (),
              y = "rs IDs") +
        theme(text = element_text(size=8),
              axis.text.x = element_blank(),
              axis.text.y = element_text(hjust=0)) +
        geom_tile(aes(fill = value), colour = "white") +
        scale_fill_gradient(low = "white" ,high = "red")
CEU

然后我用ggbio包生成我最后一个剧情的第三和第四首曲目

# DATA FOR REST OF PLOTS
wh <- genesymbol[c("SOD1")]
chr_name <- as.vector(slot(wh@seqnames,"values"))
chr_start <- slot(wh@ranges, "start") + 0 - 2000
chr_end <- slot (wh@ranges, "start") + slot (wh@ranges, "width") - 1 + 2000
gene_name <- slot(wh@ranges, "NAMES")

#THIRD PLOT - SOD1 SNPs
SNP <- bed_to_granges("SOD1_SNPs.bed")
SNP_chr <- slot(SNP@seqnames,"values")
if (chr_name %in% SNP_chr) {
        seqlengths(SNP) <- seqlengths(hg19Ideogram)[names(seqlengths(SNP))]
        SNP_dn <- keepSeqlevels(unique(SNP), chr_name)
}

SNPs_plot <-  autoplot(SNP_dn,  xlab = chr_name) +
        guides (colour = TRUE) +
        theme_bw()+
        theme(panel.grid.major = element_blank(),
              panel.grid.minor = element_blank(),
              panel.border = element_blank(),
              panel.background = element_blank()) +
        theme(legend.position="none") +
        scale_color_manual(values = score <- c("black")) +
        scale_fill_manual (breaks = score <- c("2"),
                           values= score <- c("black"),
                           name = "Variant type",
                           labels = expression(bold(SNP))) +
        xlim(chr_start,chr_end) +
        scale_x_sequnit("Mb")
fixed(SNP_IDs) <- TRUE
SNPs_plot

##FOURTH PLOT - SOD1 gene
GENES_plot <-   autoplot(Homo.sapiens, which = wh, layout = "linear",  xlab = chr_name) +
        guides (colour = FALSE) +
        xlim(chr_start,chr_end) +
        scale_x_sequnit("Mb")
fixed(GENES_plot) <- TRUE
GENES_plot

# Main tracks plot title
plot_title <- as.expression(bquote('Variation in'~italic(.(gene_name))~'gene'))

# X axis scale according to gene start and gene end
scale_combined <- GRanges(chr_name, IRanges(start = chr_start, end = chr_end))

# Combination of SNPs plot with Genes plot

Gene_SNPs <-     tracks(SNPs = SNP_IDs,
                    Genes=GENES_plot,
                    heights = c(0.5,2.0),
                    xlim = scale_combined,
                    xlab = paste("\n",chr_name,"\n"),
                    title = plot_title,
                    label.bg.fill = "grey60") +
                    scale_x_sequnit("Mb")
Gene_SNPs

当我想将CEU图(第一轨)与SNP(第三轨)和基因(第四轨)图结合起来时,问题出现了。我需要将ggplot列表(CEU)与两个GGBio对象(SNPs_plot和GENES_plot)结合起来。最重要的是,我想在LD图和SNPs_plot之间生成一条额外的轨道,它将来自CEU图的x轴的离散值与来自SNPs_plot的连续比例的SNP联系起来。

总之,这就是我所拥有的:

LD情节: LD Plot 将SNP数据与Gene跟踪相结合: SNP and Gene data 这是我想要的最终组合情节: Final

这些是我正在使用的原始文件:

'r2_SOD1_matrix.txt'输入文件

https://drive.google.com/open?id=0B34ok3wh5PjTMklhNzBrT3JoM0k

'SOD1_SNPs.bed'输入文件

https://drive.google.com/open?id=0B34ok3wh5PjTaGx6QWdZd3pmbUE

'bed2granges.R'功能

https://drive.google.com/open?id=0B34ok3wh5PjTMHNFa1pod0liYzA

R version 3.3.2 (2016-10-31)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods  
[9] base     

other attached packages:
 [1] scales_0.4.1                            reshape_0.8.6                          
 [3] VariantAnnotation_1.20.2                Rsamtools_1.26.1                       
 [5] SummarizedExperiment_1.4.0              BSgenome.Hsapiens.UCSC.hg19_1.4.0      
 [7] BSgenome_1.42.0                         Biostrings_2.42.1                      
 [9] XVector_0.14.0                          rtracklayer_1.34.1                     
[11] Homo.sapiens_1.3.1                      TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.2
[13] org.Hs.eg.db_3.4.0                      GO.db_3.4.0                            
[15] OrganismDbi_1.16.0                      GenomicFeatures_1.26.2                 
[17] AnnotationDbi_1.36.1                    Biobase_2.34.0                         
[19] GenomicRanges_1.26.2                    GenomeInfoDb_1.10.2                    
[21] IRanges_2.8.1                           S4Vectors_0.12.1                       
[23] ggbio_1.22.3                            BiocGenerics_0.20.0                    
[25] ggplot2_2.2.1                          

loaded via a namespace (and not attached):
 [1] httr_1.2.1                    AnnotationHub_2.6.4          
 [3] splines_3.3.2                 Formula_1.2-1                
 [5] shiny_1.0.0                   assertthat_0.1               
 [7] interactiveDisplayBase_1.12.0 latticeExtra_0.6-28          
 [9] RBGL_1.50.0                   yaml_2.1.14                  
[11] RSQLite_1.1-2                 backports_1.0.5              
[13] lattice_0.20-34               biovizBase_1.22.0            
[15] digest_0.6.11                 RColorBrewer_1.1-2           
[17] checkmate_1.8.2               colorspace_1.3-2             
[19] htmltools_0.3.5               httpuv_1.3.3                 
[21] Matrix_1.2-8                  plyr_1.8.4                   
[23] XML_3.98-1.5                  biomaRt_2.30.0               
[25] zlibbioc_1.20.0               xtable_1.8-2                 
[27] BiocParallel_1.8.1            htmlTable_1.8                
[29] tibble_1.2                    nnet_7.3-12                  
[31] lazyeval_0.2.0                survival_2.40-1              
[33] magrittr_1.5                  mime_0.5                     
[35] memoise_1.0.0                 GGally_1.3.0                 
[37] foreign_0.8-67                graph_1.52.0                 
[39] BiocInstaller_1.24.0          tools_3.3.2                  
[41] data.table_1.10.0             stringr_1.1.0                
[43] munsell_0.4.3                 cluster_2.0.5                
[45] ensembldb_1.6.2               grid_3.3.2                   
[47] RCurl_1.95-4.8                dichromat_2.0-0              
[49] labeling_0.3                  bitops_1.0-6                 
[51] base64enc_0.1-3               gtable_0.2.0                 
[53] DBI_0.5-1                     reshape2_1.4.2               
[55] R6_2.2.0                      GenomicAlignments_1.10.0     
[57] gridExtra_2.2.1               knitr_1.15.1                 
[59] Hmisc_4.0-2                   stringi_1.1.2                
[61] Rcpp_0.12.9                   rpart_4.1-10                 
[63] acepack_1.4.1  

0 个答案:

没有答案