共识得分和python中的WSP得分

时间:2014-01-11 11:08:30

标签: python wsp consensus

如果我有3个DNA序列,我想用某些函数来评估它们:

 seq1='AG_CT'
 seq2='AG_CT'
 seq3='ACT_T'

如何计算蟒蛇中这3个DNA序列的共识分数和对分数(WSP分数)的加权和?

共识分数是序列与共有序列之间成对分数的总和,共识(A)=总和{l} ^ {i = 1} d(i) l是序列的长度,d是两个碱基之间的距离,例如:对于A!= B,d(A,B)= 2,对于A!=' - ',d(A, - )= d( - ,A)= 1,0其他。对于上述示例,A和B可以是'A或C或G或T'

     we calculate distance between seq1 and seq2 then seq1 and seq3 then seq2 and seq3

**seq1 and seq2:**
d(A,A)=0, d(G,G)=0, d(-,-)=0, d(c,c)=0, d(t,t)=0
**seq1 and seq3**:
d(A,A)=0, d(G,C)=2, d(-,T)=1, d(c,-)=1, d(t,t)=0
**seq2 and seq3**:
d(A,A)=0, d(G,C)=2, d(-,T)=1, d(c,-)=1, d(t,t)=0


         seq1= A  G  _  C  T
         seq2= A  G  _  C  T
         seq3= A  C  T  _  T
               0  0  0  0  0
               0  2  1  1  0
               0  2  1  1  0
               ++++++++++++++
               0+ 4+ 2+ 2+ 0= 8

共识(A)= 8

对的加权总和 WSP(A)= \ sum_ {i = 1} ^ {k-1} \ sum_ {j = i + l} ^ k \ sum_ {h = 1} ^ lw ij * s(A [i,h],[j,h] l:序列长度,k个序列,w ij 序列i和j的重量

s(A,B)= 2表示A!= B,s(A, - )= d( - ,A)= - 1表示A!=' - ',3 else。所有权重因子均为1

             seq1= A  G  _  C  T
             seq2= A  G  _  C  T
             seq3= A  C  T  _  T
                   3  3  3  3  3
                   3  2 -1 -1  3
                   3  2 -1 -1  3
                   ++++++++++++++
                  (3+3+3)*1+(3+2+2)*1+(3-1-1)*1+(3-1-1)*1+(3+3+3)*1=9*1+7*1+1*1+1*1+9*1
                   9+7+1+1+9=27

因此,三个序列的WSP得分 27

1 个答案:

答案 0 :(得分:0)

我会按如下方式处理。首先,创建函数来计算各个距离和加权和:

def distance(a, b):
    """Distance between two bases a and b."""
    if a == b:
        return 0
    elif a == "_" or b == "_":
        return 1
    else:
        return 2

def w_sum(a, b, w=1):
    """Calculate the pair sum of bases a and b with weighting w."""
    if a == b:
        return 3 * w
    elif a == "_" or b == "_":
        return -1 * w
    else:
        return 2 * w

其次,使用zip function

在相同位置创建基础集
list(zip(seq1, seq2, seq3)) == [('A', 'A', 'A'), 
                                ('G', 'G', 'C'), 
                                ('_', '_', 'T'), 
                                ('C', 'C', '_'), 
                                ('T', 'T', 'T')]

第三,使用itertools.combinations生成每个位置内的对:

list(combinations(('G', 'G', 'C'), 2)) == [('G', 'G'), 
                                           ('G', 'C'), 
                                           ('G', 'C')]

最后,加上距离和总和:

from itertools import combinations

consensus = 0
wsp = 0
for position in zip(seq1, seq2, seq3): # sets at same position
    for pair in combinations(position, 2): # pairs within set
        consensus+= distance(*pair) # calculate distance
        wsp += w_sum(*pair) # calculate pair sum

注意使用*pair将2元组的碱基对解包为计算函数的两个参数。