子串之间对齐时出错

时间:2015-11-29 17:19:24

标签: python regex string

我打算基于集合中子串的方向对齐来创建邻接矩阵。由于重叠功能的错误,我无法获得所需的结果。我打算只执行方向对齐,如附图中所示的以下子串。 enter image description here

import numpy
import array
a='{ATG,TGG,TGC,GTG,GGC,GCA,GCG,CGT}'
p= dict(enumerate(a[1:-1].split(",")))
print p    
n= p.keys()[-1]
n+=1
print p.keys()
def overlap(string1, string2):
    answer = ""
    len1=len(string2)
    for i in range(len1):
       match = ""
       for j in range(0,len1):
            if (i + j < len1 and string1[i+j] == string2[j]):
                match += string2[j]
            else:
                if (len(match) > len(answer)):
                    answer = match
                    match = ""
    return answer
M=numpy.zeros([n,n],int)#define a matrix nun
print M
for k in range(0,n):
    for l in range(0,n):
        if k==l: #in matrix M let diagonal elements be 0
            pass
        elif len(str(overlap(p[k],p[l])))>0:  #if there is overlap as shown in the figure,then add 1 to the matrix.
            M[k,l]+=1
        else:
            pass           
print M

1 个答案:

答案 0 :(得分:1)

您的错误症状是针对类似overlap('CGT', 'ATG')的情况,而不是返回'',而是'T'。你可以看到它发生here

没有约束阻止您的代码在不在string2开头和string1结尾的子字符串中生成匹配项。您可以在将当前match作为answer之前选中,确认匹配实际位于string1的末尾和string2的乞讨位置,从而修复代码。

另一种方法,如果你的问题被限制为只有相同长度的字符串作为输入,你可以生成所有尝试匹配所有可能的子字符串,如下代码:

import itertools                                                                                   
def subs_until_end(str1):                                                                          
    "All substring by moving the slice start, longest first"                                       
    for i in range(len(str1)):                                                                  
        yield str1[i:]                                                                             

def subs_until_start(str1):                                                                        
    "All substring by moving the slice end, longest first"                                         
    for i in range(len(str1), 0, -1):                                                              
        yield str1[:i]                                                                             


def overlap(string1, string2):                                                                     
    for sub1, sub2 in itertools.izip(subs_until_end(string1),                                      
                                    subs_until_start(string2)):                                    
        # print "Trying %s vs %s" % (sub1, sub2)                                                   
        if sub1 == sub2:                                                                           
            return sub1                                                                            
    return ""

只需取消注释打印行,或使用this tool来理解它。