我有一个包含两个字符串的列表(包含一个序列和一些空格)。我需要在两个字符串上成对并比较每个字符并计算两个字符串不等于空格的位置
我有这个,但它对我的需求来说太慢了。有没有办法加快速度呢?
from itertools import izip
def overlap(sequence_pair):
return sum(nucleotide1 != ' ' and nucleotide2 != ' ' for nucleotide1, nucleotide2 in izip(*sequence_pair))
if __name__ == '__main__':
sequence_pair = [' AT GT ',
' GTAGCG ']
print overlap(sequence_pair)
答案 0 :(得分:5)
在Pure Python中优化代码将很困难,但如果您从一开始就使用NumPy数组而不是Python列表/字符串,那么您可以获得显着的加速:
>>> import numpy as np
>>> sequence_pair = [' AT GT '*10000, ' GTAGCG '*10000]
>>> sequence_pair_arr = np.array([list(' AT GT '*10000), list(' GTAGCG '*10000)])
>>> %timeit overlap(sequence_pair)
100 loops, best of 3: 14 ms per loop
>>> %timeit np.all(sequence_pair_arr != ' ', axis=0).sum()
100 loops, best of 3: 2.2 ms per loop