去除PDB中的杂原子

时间:2014-09-08 05:59:57

标签: python biopython protein-database

必须删除pdb文件中的杂原子。这是代码,但它不适用于我的测试PDB 1C4R。

for model in structure:
    for chain in model:
        for reisdue in chain:
            id = residue.id
            if id[0] != ' ':
                chain.detach_child(id)
        if len(chain) == 0:
            model.detach_child(chain.id)

有什么建议吗?

2 个答案:

答案 0 :(得分:2)

杂原子不应该是链的一部分。但你可以知道残留物是否是杂原子:

pdb = PDBParser().get_structure("1C4R", "1C4R.pdb")

for residue in pdb.get_residues():
    tags = residue.get_full_id()

    # tags contains a tuple with (Structure ID, Model ID, Chain ID, (Residue ID))
    # Residue ID is a tuple with (*Hetero Field*, Residue ID, Insertion Code)

    # Thus you're interested in the Hetero Field, that is empty if the residue
    # is not a hetero atom or have some flag if it is (W for waters, H, etc.)

    if tags[3][0] != " ":
        # The residue is a heteroatom
    else:
        # It is not

您还可以使用以下内容获取残留物的ID(不包含三个第一个字段)

tags = residue.id

# or het_flag,_ ,_ = residue.id

if tags[0] != " ":
    # The residue is a heteroatom
else:
    # It is not

我添加了相关文档的链接:http://biopython.org/DIST/docs/cookbook/biopdb_faq.pdf

主题在第8页,"什么是残留ID?"。引用:

  

由于笨拙的PDB格式,这有点复杂。残差id是元组   有三个要素:

     
      
  • 异形标志:这是'H_'加上异质残基的名称(例如'H_GLC'   在葡萄糖分子的情况下),或在水分子的情况下为'W'。
  •   

添加评论并继续:

from Bio.PDB import PDBParser, PDBIO, Select

class NonHetSelect(Select):
    def accept_residue(self, residue):
        return 1 if residue.id[0] == " " else 0

pdb = PDBParser().get_structure("1C4R", "1C4R.pdb")
io = PDBIO()
io.set_structure(pdb)
io.save("non_het.pdb", NonHetSelect())

答案 1 :(得分:0)

我曾经使用过代码" 去除残留物"来自http://pelican.rsvs.ulaval.ca/mediawiki/index.php/Manipulating_PDB_files_using_BioPython

它会遗漏一些杂原子。我想这可能是因为每次调用 detach_child 都会改变。

for model in structure:
    for chain in model:
        for reisdue in chain:
            id = residue.id
            if id[0] != ' ':
                chain.detach_child(id)
        if len(chain) == 0:
            model.detach_child(chain.id)

修改如下(只是避免动态修改可迭代),它对我来说很好。 (我这里只使用 structure [0] 。)

model = structure[0]
residue_to_remove = []
chain_to_remove = []
for chain in model:
    for residue in chain:
        if residue.id[0] != ' ':
            residue_to_remove.append((chain.id, residue.id))
    if len(chain) == 0:
        chain_to_remove.append(chain.id)

for residue in residue_to_remove:
    model[residue[0]].detach_child(residue[1])

for chain in chain_to_remove:
    model.detach_child(chain)