Biopython:resseq与pdb文件不匹配

时间:2017-08-02 16:36:15

标签: python bioinformatics biopython

我有一个PDB文件,我需要提取其残留序列号(resseq s)。基于对PDB文件的前几行的手动检查(粘贴在下面),我认为resseq应该是[22, 23, ...]。但是,Biopython的Bio.PDB模块另有建议(下面附带输出)。我想知道这是一个Biopython错误还是我在理解PDB格式时遇到了问题。

ATOM      1  N   GLY A  22      78.171  89.858  59.231  1.00 21.24           N  
ATOM      2  CA  GLY A  22      79.174  88.827  58.999  1.00 20.87           C  
ATOM      3  C   GLY A  22      80.438  89.415  58.391  1.00 21.89           C  
ATOM      4  O   GLY A  22      80.362  90.202  57.440  1.00 23.18           O  
ATOM      5  N   LEU A  23      81.588  89.069  58.972  1.00 21.51           N  
ATOM      6  CA  LEU A  23      82.895  89.555  58.527  1.00 20.80           C  
ATOM      7  C   LEU A  23      83.288  89.020  57.162  1.00 22.41           C  
ATOM      8  O   LEU A  23      82.889  87.923  56.788  1.00 22.93           O  
ATOM      9  CB  LEU A  23      83.973  89.232  59.560  1.00 20.97           C  
ATOM     10  CG  LEU A  23      84.225  87.818  60.062  1.00 13.32           C  
ATOM     11  CD1 LEU A  23      85.448  87.888  60.939  1.00 15.24           C  
ATOM     12  CD2 LEU A  23      83.035  87.258  60.829  1.00 12.21           C

我用来提取resseq的代码:

...
for i in chain:
    print i.get_full_id()

OUT:('pdb', 0, 'A', (' ', 2, ' '))
    ('pdb', 0, 'A', (' ', 3, ' '))
...

1 个答案:

答案 0 :(得分:3)

来自Bio.PDB.Entity.get_full_id

的文档
def get_full_id(self):
    """Return the full id.

    The full id is a tuple containing all id's starting from
    the top object (Structure) down to the current object. A full id for
    a Residue object e.g. is something like:

    ("1abc", 0, "A", (" ", 10, "A"))

    This corresponds to:

    Structure with id "1abc"
    Model with id 0
    Chain with id "A"
    Residue with id (" ", 10, "A")

    The Residue id indicates that the residue is not a hetero-residue
    (or a water) because it has a blank hetero field, that its sequence
    identifier is 10 and its insertion code "A".
    """
    # The function implementation below here ...

我假设您正在迭代链的原子而不是残差,这样就可以得到每个id而不是Atom的完整Residue

如果您将示例残差保存在名为struct.pdb的文件中并运行以下代码,则会获得正确的id

>>> structure = PDBParser().get_structure('test', 'struct.pdb')
>>> for residue in structure.get_residues():
...    print(residue.get_full_id())
('test', 0, 'A', (' ', 22, ' '))
('test', 0, 'A', (' ', 23, ' '))
>>> resseqs = [residue.id[1] for residue in structure.get_residues()]
>>> print(resseqs)
[22, 23]