如何通过Biopython从PDB文件中获取蛋白质链的长度?

时间:2016-06-25 10:15:59

标签: biopython

我首先尝试过这种方式:

Private Sub donations_Load(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles MyBase.Load
Dim total As String = 0
    For i As Integer = 0 To dondgv.RowCount - 1
        total += dondgv.Rows(i).Cells(2).Value
    Next
    Label4.Text = total

 End Sub



End Class

然后那样:

for model in structure:
    for residue in model.get_residues():
        if PDB.is_aa(residue):
            x += 1

但他们似乎都没有工作......

3 个答案:

答案 0 :(得分:1)

您的代码应该有效,并为您提供正确的结果。

from Bio import PDB
parser = PDB.PDBParser()

pdb1 ='./1bfg.pdb' 
structure = parser.get_structure("1bfg", pdb1) 
model = structure[0]
res_no = 0
non_resi = 0

for model in structure:
    for chain in model:
        for r in chain.get_residues():
            if r.id[0] == ' ':
                res_no +=1
            else:
                non_resi +=1

print ("Residues:  %i" % (res_no))
print ("Other:     %i" % (non_resi))
res_no2 = 0
non_resi2 = 0
for model in structure:
    for residue in model.get_residues():
        if PDB.is_aa(residue):
            res_no2 += 1

        else:
            non_resi2 += 1

print ("Residues2: %i" % (res_no2))
print ("Other2:    %i" % (non_resi2))

输出:

Residues:  126
Other:     99
Residues2: 126
Other2:    99

您的陈述

print (len(structure[0]['A']))

给出所有残基的总和(225),在这种情况下是所有氨基酸和水原子。

与使用PyMol进行手动检查相比,这些数字似乎是正确的。

您获得的具体错误消息或您期望的输出是什么?任何特定的PDB文件?

由于PDB文件主要用于存储已分解原子的坐标,因此并不总是可以获得完整的结构。另一种方法是用于cif文件。

from Bio import PDB
parser = PDB.PDBParser()

pdb1 ='./1bfg.cif'

m = PDB.MMCIF2Dict.MMCIF2Dict(pdb1)

if '_entity_poly.pdbx_seq_one_letter_code' in m.keys():
    print ('Full structure:')
    full_structure = (m['_entity_poly.pdbx_seq_one_letter_code'])
    print (full_structure)
    print (len(full_structure))

输出:

  

完整结构:   PALPEDGGSGAFPPGHFKDPKRLYCKNGGFFLRIHPDGRVDGVREKSDPHIKLQLQAEERGVVSIKGVSANRYLAMKEDGRLLASKSVTDECFFFERLESNNYNTYRSRKYTSWYVALKRTGQYKLGSKTGPGQKAILFLPMSAKS   146

对于多个链:

from Bio import PDB
parser = PDB.PDBParser()

pdb1 ='./4hlu.cif'

m = PDB.MMCIF2Dict.MMCIF2Dict(pdb1)

if '_entity_poly.pdbx_seq_one_letter_code' in m.keys():
    full_structure = m['_entity_poly.pdbx_seq_one_letter_code']
    chains = m['_entity_poly.pdbx_strand_id']
    for c in chains:
        print('Chain %s' % (c))
        print('Sequence: %s' % (full_structure[chains.index(c)]))

答案 1 :(得分:0)

只是:

from Bio.PDB import PDBParser
from Bio import PDB                                                       


pdb = PDBParser().get_structure("1bfg", "1bfg.pdb")

for chain in pdb.get_chains():
    print(len([_ for _ in chain.get_residues() if PDB.is_aa(_)]))

答案 2 :(得分:0)

我很感谢Peters'的回答,但我也意识到res.id[0] == " "更加健壮(即HIE)。 PDB.is_aa()无法检测到HIE是氨基酸,而HIE是ε-氮质子化的组氨酸。所以我建议:

from Bio import PDB
parser = PDB.PDBParser()

pdb1 ='./1bfg.pdb' 
structure = parser.get_structure("1bfg", pdb) 
model = structure[0]
res_no = 0
non_resi = 0

for model in structure:
    for chain in model:
        for r in chain.get_residues():
            if r.id[0] == ' ':
                res_no +=1
            else:
                non_resi +=1

print ("Residues:  %i" % (res_no))
print ("Other:     %i" % (non_resi))