Question

我首先尝试过这种方式：

Private Sub donations_Load(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles MyBase.Load
Dim total As String = 0
    For i As Integer = 0 To dondgv.RowCount - 1
        total += dondgv.Rows(i).Cells(2).Value
    Next
    Label4.Text = total

 End Sub



End Class

然后那样：

for model in structure:
    for residue in model.get_residues():
        if PDB.is_aa(residue):
            x += 1

但他们似乎都没有工作......

Answer 1

您的代码应该有效，并为您提供正确的结果。

from Bio import PDB
parser = PDB.PDBParser()

pdb1 ='./1bfg.pdb' 
structure = parser.get_structure("1bfg", pdb1) 
model = structure[0]
res_no = 0
non_resi = 0

for model in structure:
    for chain in model:
        for r in chain.get_residues():
            if r.id[0] == ' ':
                res_no +=1
            else:
                non_resi +=1

print ("Residues:  %i" % (res_no))
print ("Other:     %i" % (non_resi))
res_no2 = 0
non_resi2 = 0
for model in structure:
    for residue in model.get_residues():
        if PDB.is_aa(residue):
            res_no2 += 1

        else:
            non_resi2 += 1

print ("Residues2: %i" % (res_no2))
print ("Other2:    %i" % (non_resi2))

输出：

Residues:  126
Other:     99
Residues2: 126
Other2:    99

您的陈述

print (len(structure[0]['A']))

给出所有残基的总和（225），在这种情况下是所有氨基酸和水原子。

与使用PyMol进行手动检查相比，这些数字似乎是正确的。

您获得的具体错误消息或您期望的输出是什么？任何特定的PDB文件？

由于PDB文件主要用于存储已分解原子的坐标，因此并不总是可以获得完整的结构。另一种方法是用于cif文件。

from Bio import PDB
parser = PDB.PDBParser()

pdb1 ='./1bfg.cif'

m = PDB.MMCIF2Dict.MMCIF2Dict(pdb1)

if '_entity_poly.pdbx_seq_one_letter_code' in m.keys():
    print ('Full structure:')
    full_structure = (m['_entity_poly.pdbx_seq_one_letter_code'])
    print (full_structure)
    print (len(full_structure))

输出：

完整结构： PALPEDGGSGAFPPGHFKDPKRLYCKNGGFFLRIHPDGRVDGVREKSDPHIKLQLQAEERGVVSIKGVSANRYLAMKEDGRLLASKSVTDECFFFERLESNNYNTYRSRKYTSWYVALKRTGQYKLGSKTGPGQKAILFLPMSAKS 146

对于多个链：

from Bio import PDB
parser = PDB.PDBParser()

pdb1 ='./4hlu.cif'

m = PDB.MMCIF2Dict.MMCIF2Dict(pdb1)

if '_entity_poly.pdbx_seq_one_letter_code' in m.keys():
    full_structure = m['_entity_poly.pdbx_seq_one_letter_code']
    chains = m['_entity_poly.pdbx_strand_id']
    for c in chains:
        print('Chain %s' % (c))
        print('Sequence: %s' % (full_structure[chains.index(c)]))

Answer 2

只是：

from Bio.PDB import PDBParser
from Bio import PDB                                                       


pdb = PDBParser().get_structure("1bfg", "1bfg.pdb")

for chain in pdb.get_chains():
    print(len([_ for _ in chain.get_residues() if PDB.is_aa(_)]))

Answer 3

我很感谢Peters'的回答，但我也意识到res.id[0] == " "更加健壮（即HIE）。 PDB.is_aa（）无法检测到HIE是氨基酸，而HIE是ε-氮质子化的组氨酸。所以我建议：

from Bio import PDB
parser = PDB.PDBParser()

pdb1 ='./1bfg.pdb' 
structure = parser.get_structure("1bfg", pdb) 
model = structure[0]
res_no = 0
non_resi = 0

for model in structure:
    for chain in model:
        for r in chain.get_residues():
            if r.id[0] == ' ':
                res_no +=1
            else:
                non_resi +=1

print ("Residues:  %i" % (res_no))
print ("Other:     %i" % (non_resi))

如何通过Biopython从PDB文件中获取蛋白质链的长度？

3 个答案: