我是python编程的初学者。我想做的是以下事情:
我有一个氨基酸序列(包括缺口)和一个相应的PDB文件。 PDB文件中的氨基酸编号与序列表中的氨基酸编号不匹配。我想在PDB文件中找到每个氨基酸条目的索引,并在序列中找到相应的索引。以下是顺序:-LLPYFDF----DVPRNLTVTVGQT-GFLHCRVERLGDK-----DVSWIRKR----------DLHILTAGGTTYTSDQRFQVLRP---------------------------------------DGSANWTLQIKYPQPRDSGVYECQINTEP-KMSLSYTFNVVE-IVDPKFSSPIVNMTAPVGRDAFLTCVVQDLGPYKVAWLRVDTQTILTIQNHVITKNQRIGIANSEH---KTWTMRIKDIKESDKGWYMCQINTDPMKSQMGYLDVV----
这是我到目前为止尝试过的:
import pylab as pyl
import numpy as np
import sys
import os
import re
import argparse
import glob
def parseArgs():
"""Parse command line arguments"""
try:
parser = argparse.ArgumentParser(
description = 'Read and extract items from input PDB file')
parser.add_argument('-i',
'--input',
action='store',
required=True,
help='input PDB file in standard format')
except:
print ("An exception occurred with argument parsing. Check your provided options.")
traceback.print_exc()
return parser.parse_args()
# Reads a PDB file and returns the residue name and coordinates for
# each C-alpha atom
# (the input argument for this routine is the pdb file name.)
def get_coordinates_PDB(File_In):
try:
fl = open(File_In,'r')
except:
print('Could not open input file {0}'.format(File_In))
sys.exit()
Res = []
Points = []
#Getting from a PDB file
for line in fl:
if not(line.startswith('ATOM')):
continue
elif (line[13:15] != 'CA'):
continue
resname = line[17:20]
xyz = re.findall('[-+]?\d+\.\d+', line)
tmp = np.zeros(3)
Res.append(resname)
tmp[0] = float(xyz[0])
tmp[1] = float(xyz[1])
tmp[2] = float(xyz[2])
Points.append(tmp)
fl.close()
return Points, Res
def main():
"""Read and parse a provided PDB file."""
#Parse arguments
args = parseArgs()
File_In = args.input
print(get_coordinates_PDB(File_In))
if __name__ == '__main__':
main()
这将输出PDB文件中的x,y,z坐标和氨基酸。但是,我此时陷入僵局。
如果有人可以帮助我完成其余的工作,我将不胜感激。