如何在不使用Biopython包的情况下在python中编写

时间:2015-10-14 19:41:28

标签: python biopython

我愿意编写一个程序来提取对应于“Region”类型特征的氨基酸序列作为单独的Fasta文件,并列出具有site_type =“磷酸化”的“Site”的氨基酸和位置。

不使用Biopython PACKAGE。

(我的biopython code已经做了同样的事情)

文件在下面。

LOCUS       NP_005219               1210 aa            linear   PRI 15-MAR-2015
DEFINITION  epidermal growth factor receptor isoform a precursor [Homo
            sapiens].
ACCESSION   NP_005219
VERSION     NP_005219.2  GI:29725609
DBSOURCE    REFSEQ: accession NM_005228.3
KEYWORDS    RefSeq.
FEATURES             Location/Qualifiers
     source          1..1210
                     /organism="Homo sapiens"
                     /db_xref="taxon:9606"
                     /chromosome="7"
                     /map="7p12"
     Protein         1..1210
                     /product="epidermal growth factor receptor isoform a
                     precursor"
                     /EC_number="2.7.10.1"
                     /note="avian erythroblastic leukemia viral (v-erb-b)
                     oncogene homolog; cell proliferation-inducing protein 61;
                     cell growth inhibiting protein 40; proto-oncogene
                     c-ErbB-1; receptor tyrosine-protein kinase erbB-1"
     sig_peptide     1..24
                     /inference="COORDINATES: ab initio prediction:SignalP:4.0"
                     /calculated_mol_wt=2283
     mat_peptide     25..1210
                     /product="epidermal growth factor receptor isoform a"
                     /calculated_mol_wt=132013
     Region          57..168
                     /region_name="Recep_L_domain"
                     /note="Receptor L domain; pfam01030"
                     /db_xref="CDD:250307"
     Region          75..300
                     /region_name="Approximate"
                     /experiment="experimental evidence, no additional details
                     recorded"
                     /note="propagated from UniProtKB/Swiss-Prot (P00533.2)"
     Region          185..337
                     /region_name="Furin-like"
                     /note="Furin-like cysteine rich region; pfam00757"
                     /db_xref="CDD:250112"
     Site            229
                     /site_type="phosphorylation"
                     /experiment="experimental evidence, no additional details
                     recorded"
                     /note="Phosphoserine. {ECO:0000269|PubMed:21487020};
                     propagated from UniProtKB/Swiss-Prot (P00533.2)"
     Region          231..274
                     /region_name="FU"
                     /note="Furin-like repeats. Cysteine rich region. Exact
                     function of the domain is not known. Furin is a
                     serine-kinase dependent proprotein processor. Other
                     members of this family include endoproteases and cell
                     surface receptors; cd00064"
                     /db_xref="CDD:238021"
     Region          361..481
                     /region_name="Recep_L_domain"
                     /note="Receptor L domain; pfam01030"
                     /db_xref="CDD:250307"
     Region          390..600
                     /region_name="Approximate"
                     /experiment="experimental evidence, no additional details
                     recorded"
                     /note="propagated from UniProtKB/Swiss-Prot (P00533.2)"
     Region          505..637
                     /region_name="GF_recep_IV"
                     /note="Growth factor receptor domain IV; pfam14843"
                     /db_xref="CDD:258980"
     Region          506..559
                     /region_name="FU"
                     /note="Furin-like repeats. Cysteine rich region. Exact
                     function of the domain is not known. Furin is a
                     serine-kinase dependent proprotein processor. Other
                     members of this family include endoproteases and cell
                     surface receptors; cd00064"
                     /db_xref="CDD:238021"
     Region          558..>598
                     /region_name="FU"
                     /note="Furin-like repeats. Cysteine rich region. Exact
                     function of the domain is not known. Furin is a
                     serine-kinase dependent proprotein processor. Other
                     members of this family include endoproteases and cell
                     surface receptors; cd00064"
                     /db_xref="CDD:238021"
     Region          634..677
                     /region_name="TM_ErbB1"
                     /note="Transmembrane domain of Epidermal Growth Factor
                     Receptor or ErbB1, a Protein Tyrosine Kinase; cd12093"
                     /db_xref="CDD:213054"
     Site            order(644..646,648..653,656..657)
                     /site_type="other"
                     /note="heterodimer interface [polypeptide binding]"
                     /db_xref="CDD:213054"
     Site            646..668
                     /site_type="transmembrane region"
                     /experiment="experimental evidence, no additional details
                     recorded"
                     /note="propagated from UniProtKB/Swiss-Prot (P00533.2)"
     Site            678
                     /site_type="phosphorylation"
                     /experiment="experimental evidence, no additional details
                     recorded"
                     /note="Phosphothreonine, by PKC and PKD/PRKD1.
                     {ECO:0000269|PubMed:10523301}; propagated from
                     UniProtKB/Swiss-Prot (P00533.2)"
     Region          688..704
                     /region_name="Important for dimerization, phosphorylation
                     and activation"
                     /experiment="experimental evidence, no additional details
                     recorded"
                     /note="propagated from UniProtKB/Swiss-Prot (P00533.2)"
     Site            693
                     /site_type="phosphorylation"
                     /experiment="experimental evidence, no additional details
                     recorded"
                     /note="Phosphothreonine, by PKD/PRKD1.
                     {ECO:0000269|PubMed:10523301, ECO:0000269|PubMed:16083266,
                     ECO:0000269|PubMed:18691976, ECO:0000269|PubMed:20068231,
                     ECO:0000269|PubMed:3138233}; propagated from
                     UniProtKB/Swiss-Prot (P00533.2)"
     Site            695
                     /site_type="phosphorylation"
                     /experiment="experimental evidence, no additional details
                     recorded"
                     /note="Phosphoserine. {ECO:0000269|PubMed:18691976,
                     ECO:0000269|PubMed:3138233}; propagated from
                     UniProtKB/Swiss-Prot (P00533.2)"
     Region          704..1016
                     /region_name="PTKc_EGFR"
                     /note="Catalytic domain of the Protein Tyrosine Kinase,
                     Epidermal Growth Factor Receptor; cd05108"
                     /db_xref="CDD:270683"
     Region          712..968
                     /region_name="Pkinase_Tyr"
                     /note="Protein tyrosine kinase; pfam07714"
                     /db_xref="CDD:254379"
     Site            order(715..717,728..730,794..795,797,804..805,1009..1010)
                     /site_type="other"
                     /note="dimer interface [polypeptide binding]"
                     /db_xref="CDD:270683"
     Site            order(718..719,722..723,745,791,793,797,841..842,855,
                     876..880,885,889)
                     /site_type="active"
                     /db_xref="CDD:270683"
     Site            order(718..719,726,743,745,766,790..791,793,841..842,844,
                     855)
                     /site_type="other"
                     /note="ATP binding site [chemical binding]"
                     /db_xref="CDD:270683"
     Site            854..879
                     /site_type="other"
                     /note="activation loop (A-loop)"
                     /db_xref="CDD:270683"
     Site            order(876..880,885,889)
                     /site_type="other"
                     /note="polypeptide substrate binding site [polypeptide
                     binding]"
                     /db_xref="CDD:270683"
     Site            991
                     /site_type="phosphorylation"
                     /experiment="experimental evidence, no additional details
                     recorded"
                     /note="Phosphoserine. {ECO:0000269|PubMed:16083266,
                     ECO:0000269|PubMed:18669648, ECO:0000269|PubMed:20068231};
                     propagated from UniProtKB/Swiss-Prot (P00533.2)"
     Site            995
                     /site_type="phosphorylation"
                     /experiment="experimental evidence, no additional details
                     recorded"
                     /note="Phosphoserine. {ECO:0000269|PubMed:18669648};
                     propagated from UniProtKB/Swiss-Prot (P00533.2)"
     Site            998
                     /site_type="phosphorylation"
                     /experiment="experimental evidence, no additional details
                     recorded"
                     /note="Phosphotyrosine, by autocatalysis.
                     {ECO:0000269|PubMed:18669648,
                     ECO:0000269|PubMed:19563760}; propagated from
                     UniProtKB/Swiss-Prot (P00533.2)"
     Site            1016
                     /site_type="other"
                     /experiment="experimental evidence, no additional details
                     recorded"
                     /note="Important for interaction with PIK3C2B; propagated
                     from UniProtKB/Swiss-Prot (P00533.2)"
     Site            1016
                     /site_type="phosphorylation"
                     /experiment="experimental evidence, no additional details
                     recorded"
                     /note="Phosphotyrosine, by autocatalysis.
                     {ECO:0000269|PubMed:19563760}; propagated from
                     UniProtKB/Swiss-Prot (P00533.2)"
     Site            1026
                     /site_type="phosphorylation"
                     /experiment="experimental evidence, no additional details
                     recorded"
                     /note="Phosphoserine. {ECO:0000269|PubMed:16083266};
                     propagated from UniProtKB/Swiss-Prot (P00533.2)"
     Site            1039
                     /site_type="phosphorylation"
                     /experiment="experimental evidence, no additional details
                     recorded"
                     /note="Phosphoserine. {ECO:0000269|PubMed:18669648};
                     propagated from UniProtKB/Swiss-Prot (P00533.2)"
     Site            1041
                     /site_type="phosphorylation"
                     /experiment="experimental evidence, no additional details
                     recorded"
                     /note="Phosphothreonine. {ECO:0000269|PubMed:18669648};
                     propagated from UniProtKB/Swiss-Prot (P00533.2)"
     Site            1042
                     /site_type="phosphorylation"
                     /experiment="experimental evidence, no additional details
                     recorded"
                     /note="Phosphoserine. {ECO:0000269|PubMed:18669648};
                     propagated from UniProtKB/Swiss-Prot (P00533.2)"
     Site            1064
                     /site_type="phosphorylation"
                     /experiment="experimental evidence, no additional details
                     recorded"
                     /note="Phosphoserine. {ECO:0000269|PubMed:18669648,
                     ECO:0000269|PubMed:18691976, ECO:0000269|PubMed:20068231};
                     propagated from UniProtKB/Swiss-Prot (P00533.2)"
     Site            1069
                     /site_type="phosphorylation"
                     /experiment="experimental evidence, no additional details
                     recorded"
                     /note="Phosphotyrosine. {ECO:0000305|PubMed:22888118};
                     propagated from UniProtKB/Swiss-Prot (P00533.2)"
     Site            1070
                     /site_type="phosphorylation"
                     /experiment="experimental evidence, no additional details
                     recorded"
                     /note="Phosphoserine. {ECO:0000269|PubMed:3138233};
                     propagated from UniProtKB/Swiss-Prot (P00533.2)"
     Site            1071
                     /site_type="phosphorylation"
                     /experiment="experimental evidence, no additional details
                     recorded"
                     /note="Phosphoserine. {ECO:0000269|PubMed:3138233};
                     propagated from UniProtKB/Swiss-Prot (P00533.2)"
     Site            1081
                     /site_type="phosphorylation"
                     /experiment="experimental evidence, no additional details
                     recorded"
                     /note="Phosphoserine. {ECO:0000269|PubMed:18691976};
                     propagated from UniProtKB/Swiss-Prot (P00533.2)"
     Site            1092
                     /site_type="phosphorylation"
                     /experiment="experimental evidence, no additional details
                     recorded"
                     /note="Phosphotyrosine, by autocatalysis.
                     {ECO:0000269|PubMed:12873986}; propagated from
                     UniProtKB/Swiss-Prot (P00533.2)"
     Site            1110
                     /site_type="phosphorylation"
                     /experiment="experimental evidence, no additional details
                     recorded"
                     /note="Phosphotyrosine, by autocatalysis.
                     {ECO:0000269|PubMed:12873986, ECO:0000269|PubMed:2543678};
                     propagated from UniProtKB/Swiss-Prot (P00533.2)"
     Site            1166
                     /site_type="phosphorylation"
                     /experiment="experimental evidence, no additional details
                     recorded"
                     /note="Phosphoserine. {ECO:0000269|PubMed:18669648,
                     ECO:0000269|PubMed:18691976}; propagated from
                     UniProtKB/Swiss-Prot (P00533.2)"
     Site            1172
                     /site_type="phosphorylation"
                     /experiment="experimental evidence, no additional details
                     recorded"
                     /note="Phosphotyrosine, by autocatalysis.
                     {ECO:0000269|PubMed:17081983}; propagated from
                     UniProtKB/Swiss-Prot (P00533.2)"
     Site            1197
                     /site_type="phosphorylation"
                     /experiment="experimental evidence, no additional details
                     recorded"
                     /note="Phosphotyrosine, by autocatalysis.
                     {ECO:0000269|PubMed:17081983, ECO:0000269|PubMed:18691976,
                     ECO:0000269|PubMed:19563760, ECO:0000269|PubMed:19836242,
                     ECO:0000269|PubMed:20068231}; propagated from
                     UniProtKB/Swiss-Prot (P00533.2)"
     Site            1199
                     /site_type="methylation"
                     /experiment="experimental evidence, no additional details
                     recorded"
                     /note="Omega-N-methylarginine.
                     {ECO:0000269|PubMed:21258366}; propagated from
                     UniProtKB/Swiss-Prot (P00533.2)"
     CDS             1..1210
                     /gene="EGFR"
                     /gene_synonym="ERBB; ERBB1; HER1; mENA; NISBD2; PIG61"
                     /coded_by="NM_005228.3:247..3879"
                     /note="isoform a precursor is encoded by transcript
                     variant 1"
                     /db_xref="CCDS:CCDS5514.1"
                     /db_xref="GeneID:1956"
                     /db_xref="HGNC:HGNC:3236"
                     /db_xref="MIM:131550"
ORIGIN      
        1 mrpsgtagaa llallaalcp asraleekkv cqgtsnkltq lgtfedhfls lqrmfnncev
       61 vlgnleityv qrnydlsflk tiqevagyvl ialntverip lenlqiirgn myyensyala
      121 vlsnydankt glkelpmrnl qeilhgavrf snnpalcnve siqwrdivss dflsnmsmdf
      181 qnhlgscqkc dpscpngscw gageencqkl tkiicaqqcs grcrgkspsd cchnqcaagc
      241 tgpresdclv crkfrdeatc kdtcpplmly npttyqmdvn pegkysfgat cvkkcprnyv
      301 vtdhgscvra cgadsyemee dgvrkckkce gpcrkvcngi gigefkdsls inatnikhfk
      361 nctsisgdlh ilpvafrgds fthtppldpq eldilktvke itgflliqaw penrtdlhaf
      421 enleiirgrt kqhgqfslav vslnitslgl rslkeisdgd viisgnknlc yantinwkkl
      481 fgtsgqktki isnrgensck atgqvchalc spegcwgpep rdcvscrnvs rgrecvdkcn
      541 llegeprefv enseciqchp eclpqamnit ctgrgpdnci qcahyidgph cvktcpagvm
      601 genntlvwky adaghvchlc hpnctygctg pglegcptng pkipsiatgm vgalllllvv
      661 algiglfmrr rhivrkrtlr rllqerelve pltpsgeapn qallrilket efkkikvlgs
      721 gafgtvykgl wipegekvki pvaikelrea tspkankeil deayvmasvd nphvcrllgi
      781 cltstvqlit qlmpfgclld yvrehkdnig sqyllnwcvq iakgmnyled rrlvhrdlaa
      841 rnvlvktpqh vkitdfglak llgaeekeyh aeggkvpikw malesilhri ythqsdvwsy
      901 gvtvwelmtf gskpydgipa seissilekg erlpqppict idvymimvkc wmidadsrpk
      961 freliiefsk mardpqrylv iqgdermhlp sptdsnfyra lmdeedmddv vdadeylipq
     1021 qgffsspsts rtpllsslsa tsnnstvaci drnglqscpi kedsflqrys sdptgalted
     1081 siddtflpvp eyinqsvpkr pagsvqnpvy hnqplnpaps rdphyqdphs tavgnpeyln
     1141 tvqptcvnst fdspahwaqk gshqisldnp dyqqdffpke akpngifkgs taenaeylrv
     1201 apqssefiga
//

1 个答案:

答案 0 :(得分:0)

我建议使用biopython

from Bio import SeqIO
file = "file.gb"
#gb = next(SeqIO.parse(open(file), "genbank")) in python 3
gb = SeqIO.parse(open(file), "gb").next()
phosphorylation_list = [f for f in gb.features if f.type=="Site" and 
                       "phosphorylation" in f.qualifiers['site_type']]

for f in phosphorylation_list:
    print((int(f.location.start), int(f.location.end)))

你明白了,

(228, 229)
(677, 678)
(692, 693)
(694, 695)
(990, 991)
(994, 995)
(997, 998)
(1015, 1016)
(1025, 1026)
(1038, 1039)
(1040, 1041)
(1041, 1042)
(1063, 1064)
(1068, 1069)
(1069, 1070)
(1070, 1071)
(1080, 1081)
(1091, 1092)
(1109, 1110)
(1165, 1166)
(1171, 1172)
(1196, 1197)