我正在处理配对端的BAM文件,并提出了许多警告:
WARNING: Could not find pair for HWI-ST430:177:2:1:4979:15503#0
WARNING: Could not find pair for HWI-ST430:177:2:1:5127:13427#0
WARNING: Could not find pair for HWI-ST430:177:2:1:6521:21452#0
我检查BAM文件中的警告读取,并发现所有警告读取都有三个具有相同名称的读取。例如:
HWI-ST430:177:2:1:4979:15503#0 65 chr32 26100696 60 79M21S chr5 36697147 0 ACTTTGCAATTTAAGTTTTACTTACTTTTTAACTAATATACATGCCTAAAATTTACAAAAACAATAATAAAAACAACAGAACACTGGAAACATTTTTAAA >;=<>=<<=======<====;===;=======<=>>>>>><=>>==>>>>=>>>>==>?>=<<==>?>>>?>?==><=?>><=<>>>?>?=>??>?===> BD:Z:FFHFCIKKIHG@EEEHF??DGGEDGGE???DEEGGEFFFFGDHHHHGGE??FF?DGDG???EDGFGFGGF@@@FEHFEIEGFEEIJJIHBHGLJDD@EF@ MD:Z:79 PG:Z:MarkDuplicates RG:Z:Basenji BI:Z:FFIECHGIHFEAFEEHEAAFFHDFFHDAAAFEEIHFGGHGGGHHGHHHFBBGFBGGGHBBBFGHGGFGGFBBBGHIGHJGHGHFKJJJJEIKLJGHBGFB NM:i:0 AS:i:79 XS:i:19
HWI-ST430:177:2:1:4979:15503#0 129 chr5 36697147 60 72M28S chr32 26100696 0 ATTTGCCCCTGGGCTATTTTTTTCCTNCCATGTAAGATTCCGTTTTAAAAATGTTTCCAGTGTTCTGTTGTTTTTATTATTGTTTTTGTAAATTTTAGGC ===<=<<<<====<=>========<<!<<<=><<=>>>>>=5=>>>>>>>>>>=>>>==>=>=>>>>=?>=>>>>>>>>=?>=>>>?>>>??>??>;<=> SA:Z:chr32,26100739,-,36M64S,60,0; BD:Z:FFG@JKKFFHIIEHIGFF?????EGGEEEGHHEGEEDGFEGEGF??DE???FHEF?EGGHIFFGFEIFGGFG@@@EGGEGGGFHAAAHGJHBJJDDEHHI MD:Z:26T37T7 PG:Z:MarkDuplicates RG:Z:Basenji BI:Z:FFFBHHHFFHGGDGHGGEAAAAADFGEEEIHHGHFFFGFEGHHFBBGFBBBGHGFBEGIIIFGFEFHGFHHGCCCHIGHIGHHGDDDIIKIFKJGHGHGH NM:i:2 AS:i:65 XS:i:21
HWI-ST430:177:2:1:4979:15503#0 401 chr32 26100739 60 36M64H = 26100696 -79 GCCTAAAATTTACAAAAACAATAATAAAAACAACAG ===<=>>=>>===>===<=>===========>;=== SA:Z:chr5,36697147,+,72M28S,60,2; BD:Z:IHHE??FF?EGEF???FEFFFDFGE@@AHHIJFIFF MD:Z:36 PG:Z:MarkDuplicates RG:Z:Basenji BI:Z:HGHGBBFFAEGFFAAAEFFEGFEGFABBFGHGGHFF NM:i:0 AS:i:36 XS:i:22
BAM文件是使用bwa对齐与参考基因组对齐的HiSeq读数,并使用picard去除冗余。基础重新调整是使用gatk完成的。
我的困惑是:
1,为什么有三个具有相同名称的读取,但没有关系?
2,也许前两个被视为配对,第三个被视为单个读。我可以忽略它吗?
eveyone可以帮帮我吗?非常感谢你的帮助!