我试过写下面的程序:
import numpy as np #import package for scientific computing
dna1 = str(np.load('dna1.npy'))
def count(dna1, repeat):
i = 0
for s in range(len(dna1)):
if (s =='repeat'):
i += 1
s += dna1[0:1]
return i
repeat = 'TTTT'
n = count(dna1, repeat)
print ('{repeat} occurs {n} times in dna1'.format(repeat=repeat, n=n))
我想在列表中提取4个字母的每个可能组合,并检查它们是否等于'TTTT'
。但我不知道如何递增以在我的列表中移位1个位置但仍然读取4个字母。
答案 0 :(得分:0)
我同意尝试使用正则表达式可能是最简单的初始方法:
import numpy as np #import package for scientific computing
import re
dna1 = str(np.load('dna1.npy'))
def count(dna1, repeat):
regex = re.compile(repeat)
result = regex.findall(dna1)
return len(result)
repeat = 'TTTT'
n = count(dna1, repeat)
print ('{repeat} occurs {n} times in dna1'.format(repeat=repeat, n=n))
编辑:
这是一个不使用正则表达式模块的简单方法 - 您当然可以根据每个迭代的结果进行一些优化以跳过:
def count(dna1, repeat):
repeat_length = len(repeat)
total = 0
idx = 0
while idx < len(dna1):
substr = dna1[idx:idx+repeat_length]
if substr == repeat:
total += 1
idx += repeat_length # skip ahead to avoid repeat counting
else:
idx += 1
return total
答案 1 :(得分:0)
最佳和最可定制的方式是这样的:
import numpy as np # import package for scientific computing
dna1 = str(np.load('dna1.npy'))
repeat = 'TTTT'
def get_num_of_repeats(dna, repeat):
repeats = 0
for i in range(len(dna) - len(repeat) + 1):
if dna[i:i+len(repeat)] == repeat:
repeats += 1
return repeats
repeats = get_num_of_repeats(dna1, repeat)
print ('{repeat} occurs {n} times in dna1'.format(repeat=repeat, n=repeats))
我只是创建了一个函数get_num_of_repeats
,它请求dna变量和要监视的模式,并返回重复次数。根据您希望算法运行的方式,当您查找'TTTT'
等模式且dna的一部分具有'TTTTT'
时,可能会遇到困难。我可以给你跟进协助来定义所需的行为。