SeqIO.parse Biopython-我应指定哪种文件格式?

时间:2019-04-11 11:05:42

标签: python biopython fasta

我正在尝试使用biopython从多Fasta文件中提取信息(例如C / G / A / T计数,CG%)。当我尝试遍历每个fasta序列的文件时,我总是遇到麻烦-我只能打印出第一个。

我怀疑这可能与我的文件格式有关,因为它不是实际的fasta文件,但我不知道该如何更改。

input_file = open("inputfile.fa", 'r')
output_file = open('nucleotide_counts.txt','w')
output_file.write('Gene\tA\tC\tG\tT\tLength\tCG%\n')

#count nucleotides in this record..gene_name = cur_record.name
from Bio import SeqIO
for cur_record in SeqIO.parse(input_file, "fasta"):
    gene_name = cur_record.name 
    A_count = cur_record.seq.count('A')
    C_count = cur_record.seq.count('C')
    G_count = cur_record.seq.count('G')
    T_count = cur_record.seq.count('T')
    length = len(cur_record.seq)
    cg_percentage = (float(C_count + G_count) / length)*100
    output_line = '%s\t%i\t%i\t%i\t%i\t%i\t%f\n' % \
    (gene_name, A_count, C_count, G_count, T_count, length, cg_percentage)
    output_file.write(output_line)
output_file.close()
input_file.close()

这是我的multifasta的样子(指定了开始和结束)

>1:start-end
CGCCCCAGTGATGTAGCCGAA
>1:start-end
CGGCCACCCCGAAGCGTGGGG

我的输出文件仅包含一行:

Gene    A       C       G       T       Length  CG%
1:start-end 85      115     180     59      439     67.198178

1 个答案:

答案 0 :(得分:0)


#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
Created on Thu Dec 31 15:11:53 2020

@author: Pietro
"""
input_file = 'fasta'


output_file_name = input_file+'out'


#count nucleotides in this record..gene_name = cur_record.name
from Bio import SeqIO

output_file = open(output_file_name, 'w+')
output_file.write(('Gene\tA\tC\tG\tT\tLength\tCG%\n'))

for cur_record in SeqIO.parse(input_file, "fasta"):
    gene_name = cur_record.name 
    A_count = cur_record.seq.count('A')
    C_count = cur_record.seq.count('C')
    G_count = cur_record.seq.count('G')
    T_count = cur_record.seq.count('T')
    length = len(cur_record.seq)
    cg_percentage = (float(C_count + G_count) / length)*100
    output_line = str('%s\t%i\t%i\t%i\t%i\t%i\t%f\n' % (gene_name, A_count, C_count, G_count, T_count, length, cg_percentage))

    output_file.write(output_line)
output_file.close()