我有2个序列文件。说ham1.txt:
AAACCCTTTGGG
AGGTACTTTTTT
TCTCTTTTTTTT
等等
ham2.txt:
AAACCCTTTGGG
GAGAGGGAGGGC
AGGTACTTTTTT
CTCTTAATTTCC
TCTCTTTTTTTT
GTTTTTAAAAAA
我希望将ham1.txt中的序列与ham2.txt中的序列进行匹配,具体取决于哪一对具有最小汉明距离。我的python代码打印了所有这些之间的汉明距离。我只想要最好的配对。这是我的代码
def hamming_distance(s1, s2):
#Return the Hamming distance between equal-length sequences
if len(s1) != len(s2):
raise ValueError("Undefined for sequences of unequal length")
return sum(ch1 != ch2 for ch1, ch2 in zip(s1, s2))
with open('ham1.txt','r') as file1:
for s1 in file1:
with open('ham2.txt','r') as file2:
for s2 in file2:
dist = hamming_distance(s1,s2)
print s1,s2,dist
你能建议编辑吗?感谢
答案 0 :(得分:1)
你应该看看itertools.product
:
In [7]:
L1 = ['AAACCCTTTGGG',
'AGGTACTTTTTT',
'TCTCTTTTTTTT']
L2 = ['AAACCCTTTGGG',
'GAGAGGGAGGGC',
'AGGTACTTTTTT',
'CTCTTAATTTCC',
'TCTCTTTTTTTT',
'GTTTTTAAAAAA']
def hamming_distance(s1, s2):
#Return the Hamming distance between equal-length sequences
if len(s1) != len(s2):
raise ValueError("Undefined for sequences of unequal length")
return sum(ch1 != ch2 for ch1, ch2 in zip(s1, s2))
import itertools
res = [[hamming_distance(*item), item[0], item[1]] for item in itertools.product(L1, L2)]
sorted(res)[0]
Out[7]:
[0, 'AAACCCTTTGGG', 'AAACCCTTTGGG']
答案 1 :(得分:0)
我已经生成了以下列表
0 AAACCCTTTGGG AAACCCTTTGGG
0 AGGTACTTTTTT AGGTACTTTTTT
0 TCTCTTTTTTTT TCTCTTTTTTTT
6 AGGTACTTTTTT TCTCTTTTTTTT
6 TCTCTTTTTTTT AGGTACTTTTTT
7 AAACCCTTTGGG AGGTACTTTTTT
7 AGGTACTTTTTT AAACCCTTTGGG
8 AAACCCTTTGGG TCTCTTTTTTTT
8 AGGTACTTTTTT CTCTTAATTTCC
8 TCTCTTTTTTTT AAACCCTTTGGG
8 TCTCTTTTTTTT CTCTTAATTTCC
9 AAACCCTTTGGG GAGAGGGAGGGC
9 TCTCTTTTTTTT GTTTTTAAAAAA
10 AAACCCTTTGGG CTCTTAATTTCC
11 AGGTACTTTTTT GAGAGGGAGGGC
11 AGGTACTTTTTT GTTTTTAAAAAA
12 AAACCCTTTGGG GTTTTTAAAAAA
12 TCTCTTTTTTTT GAGAGGGAGGGC
我想这就是你的需要,对吗?
为实现这一点,我们使用了几个liberies。
首先,我将数据流/字符串转换为值列表,然后我采取每一个poosible
ham1
和ham2
的组合,并创建一个包含汉明值的新列表,
然后我对它们进行排序。
这对你有帮助吗?否则只要问我会帮你解决;)
使用的代码如下。
from distance import hamming
from collections import Counter
from itertools import product
ham1="""
AAACCCTTTGGG
AGGTACTTTTTT
TCTCTTTTTTTT
"""
ham2="""
AAACCCTTTGGG
GAGAGGGAGGGC
AGGTACTTTTTT
CTCTTAATTTCC
TCTCTTTTTTTT
GTTTTTAAAAAA
"""
ham1data = filter(None, ham1.splitlines())
ham2data = filter(None, ham2.splitlines())
res = [(hamming(h1,h2), h1, h2) for h1, h2, in product(ham1data, ham2data)]
for v, h1, h2 in sorted(res):
print v, h1, h2
答案 2 :(得分:0)
我会使用functools.reduce
:
from functools import reduce
def hamming_distance(s1, s2):
#Return the Hamming distance between equal-length sequences
if len(s1) != len(s2):
raise ValueError("Undefined for sequences of unequal length")
return sum(ch1 != ch2 for ch1, ch2 in zip(s1, s2))
if __name__ == '__main__':
with open('h1.txt') as f:
f1 = f.read().splitlines()
with open('h2.txt') as f:
f2 = f.read().splitlines()
for line in f1:
print(line, reduce(lambda x, y: x if hamming_distance(line, y) > hamming_distance(line, x) else y, f2))
输出:
AAACCCTTTGGG AAACCCTTTGGG
AGGTACTTTTTT AGGTACTTTTTT
TCTCTTTTTTTT TCTCTTTTTTTT