Question

我有一个BAM文件，在某个位置520817读取（如IGV中所示）。但是，当我使用pysam来获取特定位置上的读取名称和相关核苷酸时，我到目前为止还没有得到这个数量（只能获得大约7000个读数）。我认为当该位置上的核苷酸与参考基因组不同时，我才会读取。是否有解决方法，所以我得到了所有的读数？我从生物信息学开始......所以请让我知道你需要什么来帮助我！

非常感谢！

这是我的代码：

import pysam
import csv
import sys

#---Get a table with in the first column: read-ID; second column: SNP-location; third column: nucleotide---#
mybam = pysam.AlignmentFile("file.bam", "rb")
w = csv.writer(open("snp.csv", "wb"), delimiter=",")
w.writerow(["Read", "Loc", "Nucl"])
for pileupcolumn in mybam.pileup('chr6', 29911198,29911199):
    print ("\ncoverage at base %s = %s" %
           (pileupcolumn.pos, pileupcolumn.n))
    for pileupread in pileupcolumn.pileups:
        if not pileupread.is_del:
            if pileupcolumn.pos == 29911198:
                w.writerow((pileupread.alignment.query_name, 29911198, pileupread.alignment.query_sequence[pileupread.query_position]))             
                print ('\tbase in read %s = %s' % (pileupread.alignment.query_name, pileupread.alignment.query_sequence[pileupread.query_position]))

mybam.close()

Answer 1

检查IGV选项View-> Preference-> Alignment，某些“ filter xxxx”选项（重复，次要对齐，低质量）可能会更改输出。

通常pysam不会使用BAM_FUNMAP，BAM_FSECONDARY，BAM_FQCFAIL，BAM_FDUP标志进行堆读，因此请确保IGV视图选项与pysam.AlignmentFile.pileup中的选项相同。否则它们可能会产生不同的输出。

BAM文件：使用pysam获取特定位置的所有读取

1 个答案: