删除' TER'合并两个PDB链时PDB文件中的关键字

时间:2017-04-04 15:08:38

标签: pdb biopython

目标:应使用Biopython合并来自PDB的两个链。在下面的示例中,我想将两个链A和B合并为C.

ATOM   1133  N   VAL A 100      12.484 -30.583 106.831  1.00 30.28           N
ATOM   1134  CA  VAL A 100      11.430 -31.194 106.033  1.00 34.41           C
ATOM   1135  C   VAL A 100      11.985 -32.402 105.259  1.00 39.25           C
ATOM   1136  O   VAL A 100      11.248 -33.126 104.568  1.00 46.37           O
ATOM   1137  CB  VAL A 100      10.822 -30.174 105.029  1.00 35.16           C
ATOM   1138  CG1 VAL A 100      10.159 -29.020 105.767  1.00 36.95           C
ATOM   1139  CG2 VAL A 100      11.865 -29.669 104.007  1.00 30.60           C
TER
ATOM   1141  N   GLU B   1      12.344 -43.792 102.987  1.00 64.25           N
ATOM   1142  CA  GLU B   1      11.253 -42.785 103.240  1.00 66.15           C
ATOM   1143  C   GLU B   1      11.742 -41.350 102.948  1.00 65.40           C
ATOM   1144  O   GLU B   1      12.011 -40.595 103.895  1.00 65.31           O
ATOM   1145  CB  GLU B   1      10.779 -42.877 104.712  1.00 67.04           C

这些代码行可以将它们合并为单个链,但它们无法删除TER关键字。

merged_chains=['A', 'B']
new_rsd_num = 1
for model in structure:
  for chain in model:
    if chain.id in merged_chains:
      chain.id = 'C'
      for residue in chain:
        residue.id = (' ', new_rsd_num, ' ')
        new_rsd_num += 1

这组代码产生以下输出,其中包含两个链之间的TER关键字。

...
ATOM   1133  N   VAL C 100      12.484 -30.583 106.831  1.00 30.28           N
ATOM   1134  CA  VAL C 100      11.430 -31.194 106.033  1.00 34.41           C
ATOM   1135  C   VAL C 100      11.985 -32.402 105.259  1.00 39.25           C
ATOM   1136  O   VAL C 100      11.248 -33.126 104.568  1.00 46.37           O
ATOM   1137  CB  VAL C 100      10.822 -30.174 105.029  1.00 35.16           C
ATOM   1138  CG1 VAL C 100      10.159 -29.020 105.767  1.00 36.95           C
ATOM   1139  CG2 VAL C 100      11.865 -29.669 104.007  1.00 30.60           C
TER
ATOM   1141  N   GLU C 101      12.344 -43.792 102.987  1.00 64.25           N
ATOM   1142  CA  GLU C 101      11.253 -42.785 103.240  1.00 66.15           C
ATOM   1143  C   GLU C 101      11.742 -41.350 102.948  1.00 65.40           C
ATOM   1144  O   GLU C 101      12.011 -40.595 103.895  1.00 65.31           O
ATOM   1145  CB  GLU C 101      10.779 -42.877 104.712  1.00 67.04           C
...

但输出应该遵循应删除TER关键字。

...
ATOM   1133  N   VAL C 100      12.484 -30.583 106.831  1.00 30.28           N
ATOM   1134  CA  VAL C 100      11.430 -31.194 106.033  1.00 34.41           C
ATOM   1135  C   VAL C 100      11.985 -32.402 105.259  1.00 39.25           C
ATOM   1136  O   VAL C 100      11.248 -33.126 104.568  1.00 46.37           O
ATOM   1137  CB  VAL C 100      10.822 -30.174 105.029  1.00 35.16           C
ATOM   1138  CG1 VAL C 100      10.159 -29.020 105.767  1.00 36.95           C
ATOM   1139  CG2 VAL C 100      11.865 -29.669 104.007  1.00 30.60           C
ATOM   1141  N   GLU C 101      12.344 -43.792 102.987  1.00 64.25           N
ATOM   1142  CA  GLU C 101      11.253 -42.785 103.240  1.00 66.15           C
ATOM   1143  C   GLU C 101      11.742 -41.350 102.948  1.00 65.40           C
ATOM   1144  O   GLU C 101      12.011 -40.595 103.895  1.00 65.31           O
ATOM   1145  CB  GLU C 101      10.779 -42.877 104.712  1.00 67.04           C
...

有任何想法使用BioPython删除TER关键字吗?

1 个答案:

答案 0 :(得分:1)

残留物仍然属于链对象,即当您覆盖id时,属于链A的残基数量不会改变。

您可以将链B中的残基添加到链A中,然后删除链B.

#read a PDB file with two chains
from Bio import PDB
pdbl = PDB.PDBList()
pdbl.retrieve_pdb_file('5K04')
parser = PDB.PDBParser()
structure = parser.get_structure('5K04', pdbl.local_pdb + '/k0/pdb5k04.ent')

#get all chains
chains = list()
for model in structure:
  for chain in model:
    chains.append(chain)

#get the id of the last residue in the first chain
len_chain_a = int(chains[0].get_unpacked_list()[-1].id[1]) + 1

#get all residues from the 2nd chain
for i, residue in enumerate(chains[1].get_residues()):
    old_id = list(residue.id)
    old_id[1] = len_chain_a + i
    #increment the id
    residue.id = tuple(old_id)
    #add the residue to the first chain
    chains[0].add(residue)

#now delete all chains but the first
for model in structure:
    for chain in model:
        if chain.id != 'A':
            model.detach_child(chain.id)

#save the merged chains
pdb_io = PDB.PDBIO()
pdb_io.set_structure(structure)
pdb_io.save('5k04_merged.pdb')