我需要获取有关特定蛋白质的长度和结构域结构的信息,例如1btk。为此,我需要获得UniprotKB,我该怎么做?
来自网站http://www.rcsb.org/pdb/explore.do?structureId=1BTK
UniprotKB是' Q06187'
答案 0 :(得分:1)
您可以使用urllib2下载pdb文件,然后使用正则表达式提取Uniprot id
url_template = "http://www.rcsb.org/pdb/files/{}.pdb"
protein = "1BTK"
url = url_template.format(protein)
import urllib2
response = urllib2.urlopen(url)
pdb = response.read()
response.close() # best practice to close the file
import re
m = re.search('UNP\ +(\w+)', pdb)
m.group(1)
# you get 'Q06187'
奖励,如果您希望解析pdb文件:
from Bio.PDB.PDBParser import PDBParser
response = urllib2.urlopen(url)
parser = PDBParser()
structure = parser.get_structure(protein, response)
response.close() # best practice to close the file
header = parser.get_header()
trailer = parser.get_trailer()
#info about protein in structure, header and trailer