麻烦在循环中将值附加在有序字典中

时间:2019-05-06 23:07:22

标签: similarity ordereddictionary

我正在尝试编写一个程序,该程序采用多个长度可变的输入序列(字符串),并对每个字符串中相同索引处的相同字符数进行计数,以找到匹配次数最高的字符串。如果字符串的长度不同,我想在最后加上“ N”个字符,直到它们的长度相同。

test_date.day = 1  # Attribute 'day' of 'datetime.date' objects is not writable

修改 我正在工作的一个新角度:

import random

def DNA(length):
    return ''.join(random.choice('CGTA') for i in range(length))

seq_lst = [DNA(20) for i in range(5)]
print(seq_lst)

from collections import OrderedDict

count_dict = OrderedDict()
count_dict['_0_to_1count'] = 0
count_dict['_0_to_2count'] = 0
count_dict['_0_to_3count'] = 0
count_dict['_0_to_4count'] = 0
count_dict['_1_to_2count'] = 0
count_dict['_1_to_3count'] = 0
count_dict['_1_to_4count'] = 0
count_dict['_2_to_1count'] = 0
count_dict['_2_to_3count'] = 0
count_dict['_2_to_4count'] = 0
count_dict['_3_to_1count'] = 0
count_dict['_3_to_2count'] = 0
count_dict['_3_to_4count'] = 0


indexa = 0
indexb = 1
indexc = 0
indexd = 0

for i in seq_lst[indexa]:
    if i == seq_lst[indexb][indexc]:
        count_dict[indexd][2]
        indexc += 1
    if indexc == len(seq_lst[indexb]):
        indexb += 1
        indexd += 1
    if indexb > len(seq_lst):
        indexa +=1
        indexb = indexa + 1
    if indexa > len(seq_lst):
        break



print(count_dict)

1 个答案:

答案 0 :(得分:1)

不了解您要问的部分内容,但我将提供一种比较两个字符串的好方法,然后可将其用于比较使用 n 排列创建的字典。

计算机科学和数学中有一个名为 Levenshtein距离的概念,用于测量两个字符序列之间的差异。

它是文学分析的基础,并且具有比较字符串的计算方式,您可以使用它来确定通过插入,删除和替换从一个字符串到下一个字符串需要进行多少更改。 / p>

数学上,字符串 a b 的距离可通过以下方式测量: levenshtein distance

Levenshtein Distance的一些伪代码:

function LevenshteinDistance(char s[1..m], char t[1..n]):
  // for all i and j, d[i,j] will hold the Levenshtein distance between
  // the first i characters of s and the first j characters of t
  // note that d has (m+1)*(n+1) values
  declare int d[0..m, 0..n]

  set each element in d to zero

  // source prefixes can be transformed into empty string by
  // dropping all characters
  for i from 1 to m:
      d[i, 0] := i

  // target prefixes can be reached from empty source prefix
  // by inserting every character
  for j from 1 to n:
      d[0, j] := j

  for j from 1 to n:
      for i from 1 to m:
          if s[i-1] = t[j-1]:
            substitutionCost := 0
          else:
            substitutionCost := 1
          d[i, j] := minimum(d[i-1, j] + 1,                   // deletion
                             d[i, j-1] + 1,                   // insertion
                             d[i-1, j-1] + substitutionCost)  // substitution

  return d[m, n]

另一种类似的方法是Damerau–Levenshtein distance

根据我的经验,这是我的建议,以便找到最接近的匹配项,因为您可以获取两个DNA序列之间的距离(两个公式中的一个),然后选择一个最小的DNA序列所有比较的距离。既然这听起来像是一个家庭作业问题,那么由于您需要学习,我不想提供解决方案,但是在线上有大量资源,这应该为您尝试做的事情提供一个良好的开端。