匹配字符

时间:2015-02-28 01:59:13

标签: python

我是python的新手,我需要帮助来解决这个错误。 我有两个字典看起来像这样:

OtherSeqDict

{'Protein1':'AGCGGGTTTTTACCCCCCGTTTTGGGACCCCCACTGCGTC', 
 'Protein2':'AGCGGGTTTTACCC---GGTTTTGGACCCCCACTGCGTC',
 'Protein3':'AGCGGGTTTTTACCCCCCGTGTTGGGACCCCCACTGCGTC'}

MouseSeqDict

{'Protein4':'AGCGGCTTTTTACCCCCCGTGTTGGGACCGCCACTGCGTC'}

我正在尝试打印(i)将蛋白质4的值中的字符与Protein1,Protein2和Protein 3的值中的字符匹配 (ii)蛋白质4中的蛋白质1,蛋白质2和蛋白质3的不匹配特征以及蛋白质4中这些错配特征的位置。

我目前正在处理我的第一个问题并编辑了我在网上找到的脚本但是在运行时收到了错误

错误如下所示

p = _cache.get(cachekey)
  

TypeError:不可用类型:' list'

这是我的剧本:

otherseq=OtherSeqDict.values()
mouseseq=MouseSeqDict.values()

for match in re.finditer(mouseseq,otherseq):

        start=match.start()

        end=match.end()

        print 'Found "%s" at %d:%d' %(text[start:end],start,end)

有谁能告诉我如何做到这一点(i)和(ii)?

谢谢!

2 个答案:

答案 0 :(得分:0)

OtherSeqDict={'Protein1':'AGCGGGTTTTTACCCCCCGTTTTGGGACCCCCACTGCGTC', 'Protein2':'AGCGGGTTTTACCC---GGTTTTGGACCCCCACTGCGTC', 'Protein3':'AGCGGGTTTTTACCCCCCGTGTTGGGACCCCCACTGCGTC'}

MouseSeqDict = {'Protein4':'AGCGGCTTTTTACCCCCCGTGTTGGGACCGCCACTGCGTC'}

ms_v = MouseSeqDict['Protein4']

# get common indexes
for k, v in OtherSeqDict.items():
    # zip  MouseSeqDict value string and current value string
    # use enumerate to get the index, adding it if we find common elements at the same index from each string
    print("matched: {} {}".format(k,[i for i, tup in enumerate(zip(ms_v,  v)) if tup[0] == tup[1]]))
    print("unmatched : {} {}".format(k,[i for i, tup in enumerate(zip(ms_v,  v)) if tup[0] != tup[1]]))

matched: Protein3 [0, 1, 2, 3, 4, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39]
unmatched : Protein3 [5, 29]
matched: Protein2 [0, 1, 2, 3, 4, 6, 7, 8, 9, 12, 13, 18, 19, 21, 22, 23, 24, 27, 28, 30]
unmatched : Protein2 [5, 10, 11, 14, 15, 16, 17, 20, 25, 26, 29, 31, 32, 33, 34, 35, 36, 37, 38]
matched: Protein1 [0, 1, 2, 3, 4, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 21, 22, 23, 24, 25, 26, 27, 28, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39]
unmatched : Protein1 [5, 20, 29]

要使任何共同元素假定顺序无关紧要,您可以使用set.intersection

for k, v in OtherSeqDict.items():
    print(k, set(v).intersection(ms_v))

{'C', 'T', 'G', 'A'}
{'C', 'T', 'G', 'A'}
{'C', 'T', 'G', 'A'}

你也可以在循环中为每个元组添加索引和字母,这可能更有用:

for k, v in OtherSeqDict.items():
    print("matched: {} {}".format(k,[(i,tup[0]) for i, tup in enumerate(zip(ms_v,  v)) if tup[0] == tup[1]]))
    print("unmatched : {} {}".format(k,[(i,)+tup for i, tup in enumerate(zip(ms_v,  v)) if tup[0] != tup[1]])



matched: Protein2 [(0, 'A'), (1, 'G'), (2, 'C'), (3, 'G'), (4, 'G'), (6, 'T'), (7, 'T'), (8, 'T'), (9, 'T'), (12, 'C'), (13, 'C'), (18, 'G'), (19, 'T'), (21, 'T'), (22, 'T'), (23, 'G'), (24, 'G'), (27, 'C'), (28, 'C'), (30, 'C')]
unmatched : Protein2 [(5, 'C', 'G'), (10, 'T', 'A'), (11, 'A', 'C'), (14, 'C', '-'), (15, 'C', '-'), (16, 'C', '-'), (17, 'C', 'G'), (20, 'G', 'T'), (25, 'G', 'A'), (26, 'A', 'C'), (29, 'G', 'C'), (31, 'C', 'A'), (32, 'A', 'C'), (33, 'C', 'T'), (34, 'T', 'G'), (35, 'G', 'C'), (36, 'C', 'G'), (37, 'G', 'T'), (38, 'T', 'C')]
matched: Protein1 [(0, 'A'), (1, 'G'), (2, 'C'), (3, 'G'), (4, 'G'), (6, 'T'), (7, 'T'), (8, 'T'), (9, 'T'), (10, 'T'), (11, 'A'), (12, 'C'), (13, 'C'), (14, 'C'), (15, 'C'), (16, 'C'), (17, 'C'), (18, 'G'), (19, 'T'), (21, 'T'), (22, 'T'), (23, 'G'), (24, 'G'), (25, 'G'), (26, 'A'), (27, 'C'), (28, 'C'), (30, 'C'), (31, 'C'), (32, 'A'), (33, 'C'), (34, 'T'), (35, 'G'), (36, 'C'), (37, 'G'), (38, 'T'), (39, 'C')]
unmatched : Protein1 [(5, 'C', 'G'), (20, 'G', 'T'), (29, 'G', 'C')]
matched: Protein3 [(0, 'A'), (1, 'G'), (2, 'C'), (3, 'G'), (4, 'G'), (6, 'T'), (7, 'T'), (8, 'T'), (9, 'T'), (10, 'T'), (11, 'A'), (12, 'C'), (13, 'C'), (14, 'C'), (15, 'C'), (16, 'C'), (17, 'C'), (18, 'G'), (19, 'T'), (20, 'G'), (21, 'T'), (22, 'T'), (23, 'G'), (24, 'G'), (25, 'G'), (26, 'A'), (27, 'C'), (28, 'C'), (30, 'C'), (31, 'C'), (32, 'A'), (33, 'C'), (34, 'T'), (35, 'G'), (36, 'C'), (37, 'G'), (38, 'T'), (39, 'C')]
unmatched : Protein3 [(5, 'C', 'G'), (29, 'G', 'C')]

答案 1 :(得分:0)

这是一种方法:

otherseq = {'Protein1':'AGCGGGTTTTTACCCCCCGTTTTGGGACCCCCACTGCGTC', 
 'Protein2':'AGCGGGTTTTACCC---GGTTTTGGACCCCCACTGCGTC',
 'Protein3':'AGCGGGTTTTTACCCCCCGTGTTGGGACCCCCACTGCGTC'}

mouseseq = {'Protein4':'AGCGGCTTTTTACCCCCCGTGTTGGGACCGCCACTGCGTC'}

def compareSeqs(seq1, seq2):
    matches = [k for k, v in enumerate(zip(seq1, seq2)) if v[0] == v[1]]
    mismatches = [k for k, v in enumerate(zip(seq1, seq2)) if v[0] != v[1]]
    return (matches, mismatches)

def compareGroups(group1, group2):
    for name1 in group1:
        for name2 in group2:
            seq1 = group1[name1]
            seq2 = group2[name2]
            matches, mismatches = compareSeqs(seq1, seq2)
            print "Comparing "+name1+" vs "+name2+":"
            print "\tMatches:    ", matches
            print "\tMismatches: ", mismatches

compareGroups(mouseseq, otherseq)

输出:

Comparing Protein4 vs Protein3:
    Matches:     [0, 1, 2, 3, 4, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39]
    Mismatches:  [5, 29]
Comparing Protein4 vs Protein2:
    Matches:     [0, 1, 2, 3, 4, 6, 7, 8, 9, 12, 13, 18, 19, 21, 22, 23, 24, 27, 28, 30]
    Mismatches:  [5, 10, 11, 14, 15, 16, 17, 20, 25, 26, 29, 31, 32, 33, 34, 35, 36, 37, 38]
Comparing Protein4 vs Protein1:
    Matches:     [0, 1, 2, 3, 4, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 21, 22, 23, 24, 25, 26, 27, 28, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39]
    Mismatches:  [5, 20, 29]