从Swissprot特征元组Python中提取二级结构数据

时间:2014-12-10 11:45:55

标签: python tuples bioinformatics biopython

我需要能够从Swissprot文件中确定蛋白质中特定位置的二级结构(例如链,螺旋等)和结构域(例如信号)。看了swissprot文件中的FT行后,结果如下:

RecName: Full=Insulin; Contains: RecName: Full=Insulin B chain; Contains: RecName: Full=Insulin A     chain; Flags: Precursor;
('SIGNAL', 1, 24, '{ECO:0000269|PubMed:14426955}.', '')
('PEPTIDE', 25, 54, 'Insulin B chain.', 'PRO_0000015819')
('PROPEP', 57, 87, 'C peptide.', 'PRO_0000015820')
('PEPTIDE', 90, 110, 'Insulin A chain.', 'PRO_0000015821')
('STRAND', 26, 29, '{ECO:0000244|PDB:4EFX}.', '')
('HELIX', 33, 43, '{ECO:0000244|PDB:3W7Y}.', '')
('HELIX', 44, 46, '{ECO:0000244|PDB:3W7Y}.', '')
('HELIX', 91, 97, '{ECO:0000244|PDB:3W7Y}.', '')
('STRAND', 98, 101, '{ECO:0000244|PDB:4EFX}.', '')
('HELIX', 102, 106, '{ECO:0000244|PDB:3W7Y}.', '')
('TURN', 107, 109, '{ECO:0000244|PDB:1HIQ}.', '')

这种格式抛出了我,我猜它是嵌套的元组。如果给出氨基酸的位置,例如。 45如何提取信息以确定它是否在螺旋中?

到目前为止我的代码是:

#!/usr/bin/env python

import time
import sys 
import os
from Bio import ExPASy 
from Bio import SwissProt 

# This section receives the parameters from user input via the website:
# This will be commented out during the development period and temp. 
# variables will be used.

# acc_number = sys.argv[1]
# wild_aa = sys.argv[2]
# position = sys.arg[3]
# mutant_aa = sys.arg[4]

#Temp variables for developing:

acc_number = 'P01308'
wild_aa = 'L'
position = '43'
mutant_aa = 'P'

handle = ExPASy.get_sprot_raw(acc_number)

# this reads the swissprot file:
record = SwissProt.read(handle)

# test to see if record has been retrieved:
print record.description

# next section will parse the sequence information using the position variable
# and then will determine the secondary structure and domain location of the mutation

# accessing the secondary structure and domain information from FT lines
for feature in record.features:
   print feature

我在元组上读起来很疯狂(现在已经尝试了近一周)并且认为我已经研究了如何从中提取信息,这更像是一次将位置与二级结构相匹配。< / p>

我希望我有道理, 同盟

1 个答案:

答案 0 :(得分:0)

您可以通过索引访问元组中的项目,因此功能开始将是功能[1],结尾将是功能[2]。要仅打印与您感兴趣的位置重叠的功能,您可以使用以下内容:

if feature[1] <= position and feature[2] >= position:
    print feature

(请注意,只有当position是数字时,这才有效。在你的代码中,它是一个字符串。你需要删除值周围的引号。)