我正在尝试编写一个程序,该程序采用多个长度可变的输入序列(字符串),并对每个字符串中相同索引处的相同字符数进行计数,以找到匹配次数最高的字符串。如果字符串的长度不同,我想在最后加上“ N”个字符,直到它们的长度相同。
test_date.day = 1 # Attribute 'day' of 'datetime.date' objects is not writable
修改 我正在工作的一个新角度:
import random
def DNA(length):
return ''.join(random.choice('CGTA') for i in range(length))
seq_lst = [DNA(20) for i in range(5)]
print(seq_lst)
from collections import OrderedDict
count_dict = OrderedDict()
count_dict['_0_to_1count'] = 0
count_dict['_0_to_2count'] = 0
count_dict['_0_to_3count'] = 0
count_dict['_0_to_4count'] = 0
count_dict['_1_to_2count'] = 0
count_dict['_1_to_3count'] = 0
count_dict['_1_to_4count'] = 0
count_dict['_2_to_1count'] = 0
count_dict['_2_to_3count'] = 0
count_dict['_2_to_4count'] = 0
count_dict['_3_to_1count'] = 0
count_dict['_3_to_2count'] = 0
count_dict['_3_to_4count'] = 0
indexa = 0
indexb = 1
indexc = 0
indexd = 0
for i in seq_lst[indexa]:
if i == seq_lst[indexb][indexc]:
count_dict[indexd][2]
indexc += 1
if indexc == len(seq_lst[indexb]):
indexb += 1
indexd += 1
if indexb > len(seq_lst):
indexa +=1
indexb = indexa + 1
if indexa > len(seq_lst):
break
print(count_dict)
答案 0 :(得分:1)
不了解您要问的部分内容,但我将提供一种比较两个字符串的好方法,然后可将其用于比较使用 n 排列创建的字典。
计算机科学和数学中有一个名为 Levenshtein距离的概念,用于测量两个字符序列之间的差异。
它是文学分析的基础,并且具有比较字符串的计算方式,您可以使用它来确定通过插入,删除和替换从一个字符串到下一个字符串需要进行多少更改。 / p>
Levenshtein Distance的一些伪代码:
function LevenshteinDistance(char s[1..m], char t[1..n]):
// for all i and j, d[i,j] will hold the Levenshtein distance between
// the first i characters of s and the first j characters of t
// note that d has (m+1)*(n+1) values
declare int d[0..m, 0..n]
set each element in d to zero
// source prefixes can be transformed into empty string by
// dropping all characters
for i from 1 to m:
d[i, 0] := i
// target prefixes can be reached from empty source prefix
// by inserting every character
for j from 1 to n:
d[0, j] := j
for j from 1 to n:
for i from 1 to m:
if s[i-1] = t[j-1]:
substitutionCost := 0
else:
substitutionCost := 1
d[i, j] := minimum(d[i-1, j] + 1, // deletion
d[i, j-1] + 1, // insertion
d[i-1, j-1] + substitutionCost) // substitution
return d[m, n]
另一种类似的方法是Damerau–Levenshtein distance。
根据我的经验,这是我的建议,以便找到最接近的匹配项,因为您可以获取两个DNA序列之间的距离(两个公式中的一个),然后选择一个最小的DNA序列所有比较的距离。既然这听起来像是一个家庭作业问题,那么由于您需要学习,我不想提供解决方案,但是在线上有大量资源,这应该为您尝试做的事情提供一个良好的开端。