我需要能够从Swissprot文件中确定蛋白质中特定位置的二级结构(例如链,螺旋等)和结构域(例如信号)。看了swissprot文件中的FT行后,结果如下:
RecName: Full=Insulin; Contains: RecName: Full=Insulin B chain; Contains: RecName: Full=Insulin A chain; Flags: Precursor;
('SIGNAL', 1, 24, '{ECO:0000269|PubMed:14426955}.', '')
('PEPTIDE', 25, 54, 'Insulin B chain.', 'PRO_0000015819')
('PROPEP', 57, 87, 'C peptide.', 'PRO_0000015820')
('PEPTIDE', 90, 110, 'Insulin A chain.', 'PRO_0000015821')
('STRAND', 26, 29, '{ECO:0000244|PDB:4EFX}.', '')
('HELIX', 33, 43, '{ECO:0000244|PDB:3W7Y}.', '')
('HELIX', 44, 46, '{ECO:0000244|PDB:3W7Y}.', '')
('HELIX', 91, 97, '{ECO:0000244|PDB:3W7Y}.', '')
('STRAND', 98, 101, '{ECO:0000244|PDB:4EFX}.', '')
('HELIX', 102, 106, '{ECO:0000244|PDB:3W7Y}.', '')
('TURN', 107, 109, '{ECO:0000244|PDB:1HIQ}.', '')
这种格式抛出了我,我猜它是嵌套的元组。如果给出氨基酸的位置,例如。 45如何提取信息以确定它是否在螺旋中?
到目前为止我的代码是:
#!/usr/bin/env python
import time
import sys
import os
from Bio import ExPASy
from Bio import SwissProt
# This section receives the parameters from user input via the website:
# This will be commented out during the development period and temp.
# variables will be used.
# acc_number = sys.argv[1]
# wild_aa = sys.argv[2]
# position = sys.arg[3]
# mutant_aa = sys.arg[4]
#Temp variables for developing:
acc_number = 'P01308'
wild_aa = 'L'
position = '43'
mutant_aa = 'P'
handle = ExPASy.get_sprot_raw(acc_number)
# this reads the swissprot file:
record = SwissProt.read(handle)
# test to see if record has been retrieved:
print record.description
# next section will parse the sequence information using the position variable
# and then will determine the secondary structure and domain location of the mutation
# accessing the secondary structure and domain information from FT lines
for feature in record.features:
print feature
我在元组上读起来很疯狂(现在已经尝试了近一周)并且认为我已经研究了如何从中提取信息,这更像是一次将位置与二级结构相匹配。< / p>
我希望我有道理, 同盟
答案 0 :(得分:0)
您可以通过索引访问元组中的项目,因此功能开始将是功能[1],结尾将是功能[2]。要仅打印与您感兴趣的位置重叠的功能,您可以使用以下内容:
if feature[1] <= position and feature[2] >= position:
print feature
(请注意,只有当position是数字时,这才有效。在你的代码中,它是一个字符串。你需要删除值周围的引号。)