使用BioPython读取.pdb文件的整个目录

时间:2017-06-21 07:21:22

标签: python biopython pdb

我最近的任务是在python中编写一个程序,找到距离.pdb(蛋白质数据库)蛋白质中每种金属2埃范围内的原子。这是我为它写的脚本。

from Bio.PDB import *
parser = PDBParser(PERMISSIVE=True)

def print_coordinates(list):
    neighborList = list
    for y in neighborList:
        print "     ", y.get_coord()

structure_id = '5m6n'
fileName = '5m6n.pdb'
structure = parser.get_structure(structure_id, fileName)

atomList = Selection.unfold_entities(structure, 'A')

ns = NeighborSearch(atomList)

for x in structure.get_atoms():
    if x.name == 'ZN' or x.name == 'FE' or x.name == 'CU' or x.name == 'MG' or x.name == 'CA' or x.name == 'MN':
        center = x.get_coord()
        neighbors = ns.search(center,2.0)
        neighborList = Selection.unfold_entities(neighbors, 'A')

        print x.get_id(), ': ', neighborList
        print_coordinates(neighborList)
    else:
        continue

但这仅适用于单个.pdb文件,我希望能够读取它们的整个目录。由于我直到现在才使用Java,我不完全确定如何在Python 2.7中实现这一点。我的一个想法是,我会将脚本放在try catch语句中,然后放入while循环,然后在它到达结尾时抛出异常,但这就是我将如何在Java中完成的,不知道我将如何在Python中做到这一点。所以我很乐意听到任何人可能有的想法或示例代码。

1 个答案:

答案 0 :(得分:2)

您的代码中有一些冗余,例如,这样做也是如此:

from Bio.PDB import *
parser = PDBParser(PERMISSIVE=True)

def print_coordinates(neighborList):
    for y in neighborList:
        print "     ", y.get_coord()

structure_id = '5m6n'
fileName = '5m6n.pdb'
structure = parser.get_structure(structure_id, fileName)
metals = ['ZN', 'FE', 'CU', 'MG', 'CA', 'MN']

atomList = [atom for atom in structure.get_atoms() if atom.name in metals]
ns = NeighborSearch(Selection.unfold_entities(structure, 'A'))

for atom in atomList:
    neighbors = ns.search(atom.coord, 2)
    print("{0}: {1}").format(atom.name, neighbors)
    print_coordinates(neighborList)

要回答您的问题,您可以使用glob模块获取所有pdb文件的列表,并将代码嵌套在迭代所有文件的for循环上。假设您的pdb文件位于/home/pdb_files/

from Bio.PDB import *
from glob import glob
parser = PDBParser(PERMISSIVE=True)
pdb_files = glob('/home/pdb_files/*')

def print_coordinates(neighborList):
    for y in neighborList:
        print "     ", y.get_coord()

for fileName in pdb_files:
     structure_id = fileName.rsplit('/', 1)[1][:-4]
     structure = parser.get_structure(structure_id, fileName)
     # The rest of your code