Question

我正在尝试编写一个程序，该程序采用多个长度可变的输入序列（字符串），并对每个字符串中相同索引处的相同字符数进行计数，以找到匹配次数最高的字符串。如果字符串的长度不同，我想在最后加上“ N”个字符，直到它们的长度相同。

test_date.day = 1  # Attribute 'day' of 'datetime.date' objects is not writable

修改我正在工作的一个新角度：

import random

def DNA(length):
    return ''.join(random.choice('CGTA') for i in range(length))

seq_lst = [DNA(20) for i in range(5)]
print(seq_lst)

from collections import OrderedDict

count_dict = OrderedDict()
count_dict['_0_to_1count'] = 0
count_dict['_0_to_2count'] = 0
count_dict['_0_to_3count'] = 0
count_dict['_0_to_4count'] = 0
count_dict['_1_to_2count'] = 0
count_dict['_1_to_3count'] = 0
count_dict['_1_to_4count'] = 0
count_dict['_2_to_1count'] = 0
count_dict['_2_to_3count'] = 0
count_dict['_2_to_4count'] = 0
count_dict['_3_to_1count'] = 0
count_dict['_3_to_2count'] = 0
count_dict['_3_to_4count'] = 0


indexa = 0
indexb = 1
indexc = 0
indexd = 0

for i in seq_lst[indexa]:
    if i == seq_lst[indexb][indexc]:
        count_dict[indexd][2]
        indexc += 1
    if indexc == len(seq_lst[indexb]):
        indexb += 1
        indexd += 1
    if indexb > len(seq_lst):
        indexa +=1
        indexb = indexa + 1
    if indexa > len(seq_lst):
        break



print(count_dict)

Answer 1

不了解您要问的部分内容，但我将提供一种比较两个字符串的好方法，然后可将其用于比较使用 n 排列创建的字典。

计算机科学和数学中有一个名为 Levenshtein距离的概念，用于测量两个字符序列之间的差异。

它是文学分析的基础，并且具有比较字符串的计算方式，您可以使用它来确定通过插入，删除和替换从一个字符串到下一个字符串需要进行多少更改。 / p>

数学上，字符串 a 和 b 的距离可通过以下方式测量：

Levenshtein Distance的一些伪代码：

function LevenshteinDistance(char s[1..m], char t[1..n]):
  // for all i and j, d[i,j] will hold the Levenshtein distance between
  // the first i characters of s and the first j characters of t
  // note that d has (m+1)*(n+1) values
  declare int d[0..m, 0..n]

  set each element in d to zero

  // source prefixes can be transformed into empty string by
  // dropping all characters
  for i from 1 to m:
      d[i, 0] := i

  // target prefixes can be reached from empty source prefix
  // by inserting every character
  for j from 1 to n:
      d[0, j] := j

  for j from 1 to n:
      for i from 1 to m:
          if s[i-1] = t[j-1]:
            substitutionCost := 0
          else:
            substitutionCost := 1
          d[i, j] := minimum(d[i-1, j] + 1,                   // deletion
                             d[i, j-1] + 1,                   // insertion
                             d[i-1, j-1] + substitutionCost)  // substitution

  return d[m, n]

另一种类似的方法是Damerau–Levenshtein distance。

根据我的经验，这是我的建议，以便找到最接近的匹配项，因为您可以获取两个DNA序列之间的距离（两个公式中的一个），然后选择一个最小的DNA序列所有比较的距离。既然这听起来像是一个家庭作业问题，那么由于您需要学习，我不想提供解决方案，但是在线上有大量资源，这应该为您尝试做的事情提供一个良好的开端。

麻烦在循环中将值附加在有序字典中

1 个答案: