[U]我使用tophat(v2.1.0)运行此代码,使用来自igenomes的bowtie2 genomes.bt2索引(Homo_sapiens_UCSC_hg19)从我的RNA-seq fastq文件中对齐读数(bowtie2(v2.2.6.0))( [/ U]:
tophat2 -p 8 -G /home/ajsn6c/Desktop/Kumar_RNA-seq/Homo_sapiens_UCSC_hg19 /Homo_sapiens/UCSC/hg19/Sequence/Bowtie2Index/hg19.gtf /home/ajsn6c/Desktop/Kumar_RNA-seq/Homo_sapiens_UCSC_hg19/Homo_sapiens/UCSC/hg19/Sequence/Bowtie2Index/genome HPDE_S11_L002_R1_001.fastq
[U]我的fastq文件大约是13 GB。但是,在对齐后,我接受的命中文件只有50 MB。[/ U]
[U]继续对齐输出说我有大约5500万保持读数:[/ U]
[2018-02-21 13:58:33] Checking for Bowtie
Bowtie version: 2.2.6.0
[2018-02-21 13:58:33] Checking for Bowtie index files (genome)..
[2018-02-21 13:58:33] Checking for reference FASTA file
[2018-02-21 13:58:33] Generating SAM header for /home/ajsn6c/Desktop /Kumar_RNA-seq/Homo_sapiens_UCSC_hg19/Homo_sapiens/UCSC/hg19/Sequence/Bowtie2Index/genome
[2018-02-21 13:58:35] Reading known junctions from GTF file
[2018-02-21 13:58:39] Preparing reads
left reads: min. length=12, max. length=101, 55970267 kept reads (45104 discarded)
Warning: short reads (<20bp) will make TopHat quite slow and take large amount of memory because they are likely to be mapped in too many places
[2018-02-21 14:17:45] Building transcriptome data files Panc1/tmp/genes
[2018-02-21 14:17:59] Building Bowtie index from genes.fa
[2018-02-21 14:32:14] Mapping left_kept_reads to transcriptome genes with Bowtie2
[2018-02-21 15:38:44] Resuming TopHat pipeline with unmapped reads
[2018-02-21 15:38:44] Mapping left_kept_reads.m2g_um to genome genome with Bowtie2
[2018-02-21 16:17:07] Mapping left_kept_reads.m2g_um_seg1 to genome genome with Bowtie2 (1/4)
[2018-02-21 16:18:13] Mapping left_kept_reads.m2g_um_seg2 to genome genome with Bowtie2 (2/4)
[2018-02-21 16:19:32] Mapping left_kept_reads.m2g_um_seg3 to genome genome with Bowtie2 (3/4)
[2018-02-21 16:20:46] Mapping left_kept_reads.m2g_um_seg4 to genome genome with Bowtie2 (4/4)
[2018-02-21 16:21:59] Searching for junctions via segment mapping
[2018-02-21 16:25:24] Retrieving sequences for splices
[2018-02-21 16:27:18] Indexing splices
Building a SMALL index
[2018-02-21 16:27:37] Mapping left_kept_reads.m2g_um_seg1 to genome segment_juncs with Bowtie2 (1/4)
[2018-02-21 16:27:50] Mapping left_kept_reads.m2g_um_seg2 to genome segment_juncs with Bowtie2 (2/4)
[2018-02-21 16:28:03] Mapping left_kept_reads.m2g_um_seg3 to genome segment_juncs with Bowtie2 (3/4)
[2018-02-21 16:28:17] Mapping left_kept_reads.m2g_um_seg4 to genome segment_juncs with Bowtie2 (4/4)
[2018-02-21 16:28:31] Joining segment hits
[2018-02-21 16:31:02] Reporting output tracks
[2018-02-22 19:21:42] A summary of the alignment counts can be found in ./tophat_out/align_summary.txt
[2018-02-22 19:21:42] Run complete: 02:08:37 elapse
[U]这是align_summary文件[/ U]的对齐摘要:
reads:
Input : 926337
Mapped : 898584 (97.0% of input)
of these: 14621 ( 1.6%) have multiple alignments (14 have >20)
总读取映射率为97.0%。
为什么输入只有900K,当它保持5500万次读取?读数的质量也有很好的成绩。任何想法将不胜感激!
由于 亚历
答案 0 :(得分:0)
日志文件中的这些条目是奇数:
[2018-02-21 14:17:45]构建转录组数据文件Panc1 / tmp / genes
[2018-02-21 14:17:59]从genes.fa建立Bowtie指数
这是您的tophat2
命令(我重新组织了命令以帮助提高可读性)
./tophat2 \
-p 8 \
-G /home/ajsn6c/Desktop/Kumar_RNA-seq/Homo_sapiens_UCSC_hg19 /Homo_sapiens/UCSC/hg19/Sequence/Bowtie2Index/hg19.gtf \
/home/ajsn6c/Desktop/Kumar_RNA-seq/Homo_sapiens_UCSC_hg19/Homo_sapiens/UCSC/hg19/Sequence/Bowtie2Index/genome \
HPDE_S11_L002_R1_001.fastq
[...]Homo_sapiens_UCSC_hg19 /Homo_sapiens[...]
;不确定这是否是问题。[...]/UCSC/hg19/Sequence/Bowtie2Index/hg19.gtf
构建转录组;我不知道Panc1/tmp/genes
来自哪里,但显然这个文件用于构建参考转录组,而不是[...]/hg19.gtf
。