我有一个gb文件,我需要从文件中提取一些特定的功能:蛋白质编码基因的名称和大小。
LOCUS NC_008137 15318 bp DNA linear MAM 15-APR-2009
DEFINITION Phalanger interpositus mitochondrion, complete genome.
ACCESSION NC_008137
VERSION NC_008137.1 GI:108793518
DBLINK Project: 17043
KEYWORDS .
SOURCE mitochondrion Phalanger interpositus (Stein's cuscus)
ORGANISM Phalanger interpositus
Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi;
Mammalia; Metatheria; Diprotodontia; Phalangeridae; Phalanger.
REFERENCE 1 (bases 1 to 15318)
AUTHORS Munemasa,M., Nikaido,M., Donnellan,S., Austin,C.C., Okada,N. and
Hasegawa,M.
TITLE Phylogenetic analysis of diprotodontian marsupials based on
complete mitochondrial genomes
JOURNAL Genes Genet. Syst. 81 (3), 181-191 (2006)
PUBMED 16905872
REFERENCE 2 (bases 1 to 15318)
CONSRTM NCBI Genome Project
TITLE Direct Submission
JOURNAL Submitted (12-JUN-2006) National Center for Biotechnology
Information, NIH, Bethesda, MD 20894, USA
REFERENCE 3 (bases 1 to 15318)
AUTHORS Munemasa,M., Nikaido,M., Donnellan,S., Austin,C.C., Okada,N. and
Hasegawa,M.
TITLE Direct Submission
JOURNAL Submitted (08-NOV-2005) Tokyo Institute of Technology, Graduate
School of Bioscience and Biotechnology; Nagatsuta-cho 4259-B-21,
Midori-ku, Kanagawa 226-8501, Japan
COMMENT REVIEWED REFSEQ: This record has been curated by NCBI staff. The
reference sequence was derived from AB241057.
Genome sequence lacks part of non-coding region.
COMPLETENESS: full length.
FEATURES Location/Qualifiers
source 1..15318
/organism="Phalanger interpositus"
/organelle="mitochondrion"
/mol_type="genomic DNA"
/db_xref="taxon:356347"
/tissue_type="liver"
/common="Stein's cuscus"
tRNA 1..69
/product="tRNA-Phe"
rRNA 72..1018
/product="s-rRNA"
/note="12S ribosomal RNA"
tRNA 1020..1088
/product="tRNA-Val"
rRNA 1089..2653
/product="l-rRNA"
/note="16S ribosomal RNA"
tRNA 2654..2727
/product="tRNA-Leu"
/codon_recognized="UUR"
gene 2729..3685
/gene="ND1"
/db_xref="GeneID:4117948"
CDS 2729..3685
/gene="ND1"
/codon_start=1
/transl_table=2
/product="NADH dehydrogenase subunit 1"
/protein_id="YP_637062.1"
/db_xref="GI:108793519"
/db_xref="GeneID:4117948"
/translation="MFIINLLMYIIPILLAIAFLTLVERKALGYMQFRKGPNVVGPYG
LLQPIADGMKLFSKEPLQPVTSSTTMFIIAPTLALTLSLTMWTPLPMPHSLIDLNLGL
LFILALSGLSVYSILWSGWASNSKYALMGALRAVAQTISYEVTLAIILLSIMLINGSF
TLKNLITTQENMWLIITTWPLVMMWYVSTLAETNRAPLDLTEGESELVSGFNVEYAAG
PFAMFFLAEYANIMLMNAMTTILFLGSSINHNFTHLNTLSFMTKTIALTFLFLWVRAS
YPRFRYDQLMHLLWKNFLPMTLAMCLWFISIPIALSCIPPQI"
misc_feature 2729..3682
/gene="ND1"
/note="NADH dehydrogenase; Region: NADHdh; cl00469"
/db_xref="CDD:186018"
tRNA 3686..3751
/product="tRNA-Ile"
tRNA complement(3750..3821)
/product="tRNA-Gln"
tRNA 3821..3878
/product="tRNA-Met"
gene 3889..4932
/gene="ND2"
/db_xref="GeneID:4117949"
CDS 3889..4932
/gene="ND2"
/codon_start=1
/transl_table=2
/product="NADH dehydrogenase subunit 2"
/protein_id="YP_637063.1"
/db_xref="GI:108793520"
/db_xref="GeneID:4117949"
/translation="MSPYILLIMLTSLLLGTSLTLFSNHWLTAWMGLEINTLAIIPMM
TYPNHPRATESAIKYFLTQSTASMMLMFAIINNAWMTNQWTLLQTSDQTSSTIMTLAL
AMKLGLAPFHFWVPEVTQGIPLTSGMILLTWQKIAPTSLMYQISPSLNMKILVMLALL
STILGGWGGLNQTHMRKILAYSSIAHMGWMTIIILINPTLTLLNLAIYITTTLTLFLA
LNHSSITKIKSLANLWNKSSSMTIVIALTLLSLGGLPPLTGFMPKWLILQELITYNNI
ATATMMAMSALLNLFFYMRIIYTTTLTMPPSINNSKLQWPHPQTKTTNIIPLLTIISS
FLLPLTPLSITLS"
我使用了seqFeature和子功能但它没有用。
从这个档案我应该得到(ND1和2729..3685,ND2和3889..4932,......如果还有更多)
我是biopython的新手,并希望得到如何做到这一点的帮助。
答案 0 :(得分:2)
您发布的genbank文件不完整,有部分错过,但没有//
终止行。解析器然后被卡住试图阅读它。
我从Phalanger interpositus获得了here线粒体的正确文件 然后(py3k代码):
>>>
>>> from Bio import SeqIO
>>> arch = "C:/code/NC_008137.gbk"
>>> record = SeqIO.parse(arch, "genbank")
>>> rec = next(record) # there is only one record
>>> for f in rec.features:
if f.type == 'gene':
print(f.qualifiers['gene'], f.location)
['ND1'] [2728:3685]
['ND2'] [3888:4932]
['COX1'] [5365:6919]
['COX2'] [7052:7737]
['ATP8'] [7798:8005]
['ATP6'] [7959:8640]
['COX3'] [8639:9423]
['ND3'] [9488:9837]
['ND4L'] [9906:10203]
['ND4'] [10196:11574]
['ND5'] [11773:13582]
['ND6'] [13578:14082]
['CYTB'] [14155:15301]
>>>