如果我有3个DNA序列,我想用某些函数来评估它们:
seq1='AG_CT'
seq2='AG_CT'
seq3='ACT_T'
如何计算蟒蛇中这3个DNA序列的共识分数和对分数(WSP分数)的加权和?
共识分数是序列与共有序列之间成对分数的总和,共识(A)=总和{l} ^ {i = 1} d(i) l是序列的长度,d是两个碱基之间的距离,例如:对于A!= B,d(A,B)= 2,对于A!=' - ',d(A, - )= d( - ,A)= 1,0其他。对于上述示例,A和B可以是'A或C或G或T'
we calculate distance between seq1 and seq2 then seq1 and seq3 then seq2 and seq3
**seq1 and seq2:**
d(A,A)=0, d(G,G)=0, d(-,-)=0, d(c,c)=0, d(t,t)=0
**seq1 and seq3**:
d(A,A)=0, d(G,C)=2, d(-,T)=1, d(c,-)=1, d(t,t)=0
**seq2 and seq3**:
d(A,A)=0, d(G,C)=2, d(-,T)=1, d(c,-)=1, d(t,t)=0
seq1= A G _ C T
seq2= A G _ C T
seq3= A C T _ T
0 0 0 0 0
0 2 1 1 0
0 2 1 1 0
++++++++++++++
0+ 4+ 2+ 2+ 0= 8
共识(A)= 8
对的加权总和 WSP(A)= \ sum_ {i = 1} ^ {k-1} \ sum_ {j = i + l} ^ k \ sum_ {h = 1} ^ lw ij * s(A [i,h],[j,h] l:序列长度,k个序列,w ij 序列i和j的重量
s(A,B)= 2表示A!= B,s(A, - )= d( - ,A)= - 1表示A!=' - ',3 else。所有权重因子均为1
seq1= A G _ C T
seq2= A G _ C T
seq3= A C T _ T
3 3 3 3 3
3 2 -1 -1 3
3 2 -1 -1 3
++++++++++++++
(3+3+3)*1+(3+2+2)*1+(3-1-1)*1+(3-1-1)*1+(3+3+3)*1=9*1+7*1+1*1+1*1+9*1
9+7+1+1+9=27
因此,三个序列的WSP得分 27
答案 0 :(得分:0)
我会按如下方式处理。首先,创建函数来计算各个距离和加权和:
def distance(a, b):
"""Distance between two bases a and b."""
if a == b:
return 0
elif a == "_" or b == "_":
return 1
else:
return 2
和
def w_sum(a, b, w=1):
"""Calculate the pair sum of bases a and b with weighting w."""
if a == b:
return 3 * w
elif a == "_" or b == "_":
return -1 * w
else:
return 2 * w
其次,使用zip
function:
list(zip(seq1, seq2, seq3)) == [('A', 'A', 'A'),
('G', 'G', 'C'),
('_', '_', 'T'),
('C', 'C', '_'),
('T', 'T', 'T')]
第三,使用itertools.combinations
生成每个位置内的对:
list(combinations(('G', 'G', 'C'), 2)) == [('G', 'G'),
('G', 'C'),
('G', 'C')]
最后,加上距离和总和:
from itertools import combinations
consensus = 0
wsp = 0
for position in zip(seq1, seq2, seq3): # sets at same position
for pair in combinations(position, 2): # pairs within set
consensus+= distance(*pair) # calculate distance
wsp += w_sum(*pair) # calculate pair sum
注意使用*pair
将2元组的碱基对解包为计算函数的两个参数。