将核苷酸位置与fasta文件中的序列匹配

时间:2017-08-21 09:48:26

标签: python bioinformatics biopython fasta

我有职位列表:

rpivotTable::rpivotTable(df, rows = c("Country Region", "Function", "SB"), cols = c("+1 Country Region", "+1 Function", "+1 SB"))

并希望能够在其核苷酸序列中转换这些位置,从而提供自定义的fasta文件。如:

chr1 1000
chr2 2000
chr3 4000

python中是否有任何已编写的工具可以完成这项工作?

1 个答案:

答案 0 :(得分:3)

鉴于FASTA文件chromosomes.fasta

>chr1
GATTACA
>chr2
ATTACGA
>chr3
GCCAACG

位置文件positions.txt

chr1 3

chr2 4

chr3 5

您可以使用以下代码:

from Bio import SeqIO
record_dict = SeqIO.to_dict(SeqIO.parse('chromosomes.fasta', "fasta"))

chromosome_positions = {}
with open('positions.txt') as f:
    for line in f.read().splitlines():
        if line:
            chromosome, position = line.split()
            chromosome_positions[chromosome] = int(position)


for chromosome in chromosome_positions:
    seq = record_dict[chromosome]
    position = chromosome_positions[chromosome]
    base = seq[position]
    print chromosome, position, base

将输出:

chr3 5 C
chr2 4 C
chr1 3 T

请注意,Python使用zero-based indexing,因此5中的positions.txt位置将为您提供相应序列中的第六个基础。