我有一个序列比对:
RefSeq :MXKQRSLPLXQKRTKQAISFSASHRIYLQRKFSH .....
Templatepdb:-----------------ISFSASHR------FSHAQADFAG
我正在尝试编写一个代码,根据PDB文件中的这种对齐重新编号残差:
原始pdb:RES ID = 1 1 1 1 1 2 2 2 2 3 3 3 3 3 3 4 4 4 4 4 5 ...
新pdb:RES ID = 18 18 18 19 19 19 19 19 20 20 20 21 21 22 23 24 25 31 31 31 31 32 32 33 34 35 36 ...
如果对齐在对齐开始时只有间隙,则很容易理解。只计算间隙(" - ")并在residual.id ="中添加间隙总和。 " "差距之和" " "
但是,如果序列中间有间隙,我找不到办法。
你有什么建议吗?
答案 0 :(得分:2)
如果我理解正确的话,
您的输入是对齐方式:
[18, 18, 18, 18, 18, 18, 18, 18, 19, 19, 19, 19, 19, 19, 20, 20, 20, 20, 20, 20, 20, 20, 21, 21, 22, 22, 22, 22, 22, 22, 22, 23, 23, 23, 23, 23, 23, 23, 24, 24, 24, 24, 24, 25, 25, 25, 25, 32, 32, 32, 33, 34, 34, 34, 34, 35, 35, 35, 35, 35, 35, 36, 36, 36, 36, 36, 36, 36, 36, 37, 37, 37, 37, 37, 37, 37, 37, 37, 38, 38, 38, 38, 38, 38, 38, 38, 38, 39, 39, 39, 39, 39, 39, 39, 39, 39, 39, 40, 41, 41, 41, 41]
和残留数字列表:
shift_dict
您的输出是残留数量偏移了残差之前的间隙数量:
import itertools
import random
def random_residue_number(sequence):
nested = [[i + 1] * random.randint(1, 10) for i in range(len(sequence))]
merged = list(itertools.chain.from_iterable(nested))
return merged
def aligned_residue_number(alignment, original_number):
gap_shift = 0
residue_count = 0
shift_dict = {}
for residue in alignment:
if residue == '-':
gap_shift += 1
else:
residue_count += 1
shift_dict[residue_count] = gap_shift + residue_count
return [shift_dict[number] for number in original_number]
sequence = 'ISFSASHRFSHAQADFAG'
alignment = '-----------------ISFSASHR------FSHAQADFAG'
original_number = random_residue_number(sequence)
print(original_number)
# [1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 4, 4, 5, 5, 5, 5, 5, 5, 5, 6, 6, 6, 6, 6, 6, 6, 7, 7, 7, 7, 7, 8, 8, 8, 8, 9, 9, 9, 10, 11, 11, 11, 11, 12, 12, 12, 12, 12, 12, 13, 13, 13, 13, 13, 13, 13, 13, 14, 14, 14, 14, 14, 14, 14, 14, 14, 15, 15, 15, 15, 15, 15, 15, 15, 15, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 17, 18, 18, 18, 18]
new_number = aligned_residue_number(alignment, original_number)
print(new_number)
# [18, 18, 18, 18, 18, 18, 18, 18, 19, 19, 19, 19, 19, 19, 20, 20, 20, 20, 20, 20, 20, 20, 21, 21, 22, 22, 22, 22, 22, 22, 22, 23, 23, 23, 23, 23, 23, 23, 24, 24, 24, 24, 24, 25, 25, 25, 25, 32, 32, 32, 33, 34, 34, 34, 34, 35, 35, 35, 35, 35, 35, 36, 36, 36, 36, 36, 36, 36, 36, 37, 37, 37, 37, 37, 37, 37, 37, 37, 38, 38, 38, 38, 38, 38, 38, 38, 38, 39, 39, 39, 39, 39, 39, 39, 39, 39, 39, 40, 41, 41, 41, 41]
以下是演示它的代码。有许多方法可以计算输出。
我这样做的方法是将字典Text
与密钥保持为原始数字和值作为移位数字。
Entry