目标:应使用Biopython合并来自PDB的两个链。在下面的示例中,我想将两个链A和B合并为C.
ATOM 1133 N VAL A 100 12.484 -30.583 106.831 1.00 30.28 N
ATOM 1134 CA VAL A 100 11.430 -31.194 106.033 1.00 34.41 C
ATOM 1135 C VAL A 100 11.985 -32.402 105.259 1.00 39.25 C
ATOM 1136 O VAL A 100 11.248 -33.126 104.568 1.00 46.37 O
ATOM 1137 CB VAL A 100 10.822 -30.174 105.029 1.00 35.16 C
ATOM 1138 CG1 VAL A 100 10.159 -29.020 105.767 1.00 36.95 C
ATOM 1139 CG2 VAL A 100 11.865 -29.669 104.007 1.00 30.60 C
TER
ATOM 1141 N GLU B 1 12.344 -43.792 102.987 1.00 64.25 N
ATOM 1142 CA GLU B 1 11.253 -42.785 103.240 1.00 66.15 C
ATOM 1143 C GLU B 1 11.742 -41.350 102.948 1.00 65.40 C
ATOM 1144 O GLU B 1 12.011 -40.595 103.895 1.00 65.31 O
ATOM 1145 CB GLU B 1 10.779 -42.877 104.712 1.00 67.04 C
这些代码行可以将它们合并为单个链,但它们无法删除TER关键字。
merged_chains=['A', 'B']
new_rsd_num = 1
for model in structure:
for chain in model:
if chain.id in merged_chains:
chain.id = 'C'
for residue in chain:
residue.id = (' ', new_rsd_num, ' ')
new_rsd_num += 1
这组代码产生以下输出,其中包含两个链之间的TER关键字。
...
ATOM 1133 N VAL C 100 12.484 -30.583 106.831 1.00 30.28 N
ATOM 1134 CA VAL C 100 11.430 -31.194 106.033 1.00 34.41 C
ATOM 1135 C VAL C 100 11.985 -32.402 105.259 1.00 39.25 C
ATOM 1136 O VAL C 100 11.248 -33.126 104.568 1.00 46.37 O
ATOM 1137 CB VAL C 100 10.822 -30.174 105.029 1.00 35.16 C
ATOM 1138 CG1 VAL C 100 10.159 -29.020 105.767 1.00 36.95 C
ATOM 1139 CG2 VAL C 100 11.865 -29.669 104.007 1.00 30.60 C
TER
ATOM 1141 N GLU C 101 12.344 -43.792 102.987 1.00 64.25 N
ATOM 1142 CA GLU C 101 11.253 -42.785 103.240 1.00 66.15 C
ATOM 1143 C GLU C 101 11.742 -41.350 102.948 1.00 65.40 C
ATOM 1144 O GLU C 101 12.011 -40.595 103.895 1.00 65.31 O
ATOM 1145 CB GLU C 101 10.779 -42.877 104.712 1.00 67.04 C
...
但输出应该遵循应删除TER关键字。
...
ATOM 1133 N VAL C 100 12.484 -30.583 106.831 1.00 30.28 N
ATOM 1134 CA VAL C 100 11.430 -31.194 106.033 1.00 34.41 C
ATOM 1135 C VAL C 100 11.985 -32.402 105.259 1.00 39.25 C
ATOM 1136 O VAL C 100 11.248 -33.126 104.568 1.00 46.37 O
ATOM 1137 CB VAL C 100 10.822 -30.174 105.029 1.00 35.16 C
ATOM 1138 CG1 VAL C 100 10.159 -29.020 105.767 1.00 36.95 C
ATOM 1139 CG2 VAL C 100 11.865 -29.669 104.007 1.00 30.60 C
ATOM 1141 N GLU C 101 12.344 -43.792 102.987 1.00 64.25 N
ATOM 1142 CA GLU C 101 11.253 -42.785 103.240 1.00 66.15 C
ATOM 1143 C GLU C 101 11.742 -41.350 102.948 1.00 65.40 C
ATOM 1144 O GLU C 101 12.011 -40.595 103.895 1.00 65.31 O
ATOM 1145 CB GLU C 101 10.779 -42.877 104.712 1.00 67.04 C
...
有任何想法使用BioPython删除TER关键字吗?
答案 0 :(得分:1)
残留物仍然属于链对象,即当您覆盖id
时,属于链A的残基数量不会改变。
您可以将链B中的残基添加到链A中,然后删除链B.
#read a PDB file with two chains
from Bio import PDB
pdbl = PDB.PDBList()
pdbl.retrieve_pdb_file('5K04')
parser = PDB.PDBParser()
structure = parser.get_structure('5K04', pdbl.local_pdb + '/k0/pdb5k04.ent')
#get all chains
chains = list()
for model in structure:
for chain in model:
chains.append(chain)
#get the id of the last residue in the first chain
len_chain_a = int(chains[0].get_unpacked_list()[-1].id[1]) + 1
#get all residues from the 2nd chain
for i, residue in enumerate(chains[1].get_residues()):
old_id = list(residue.id)
old_id[1] = len_chain_a + i
#increment the id
residue.id = tuple(old_id)
#add the residue to the first chain
chains[0].add(residue)
#now delete all chains but the first
for model in structure:
for chain in model:
if chain.id != 'A':
model.detach_child(chain.id)
#save the merged chains
pdb_io = PDB.PDBIO()
pdb_io.set_structure(structure)
pdb_io.save('5k04_merged.pdb')