Question

我有一个代码，理论上应该输入有错误的DNA并删除所有错误（在我的情况下为N），并计算在该位置删除了多少个N.

我的代码：

class dnaString (str):
    def __new__(self,s):
        #the inputted DNA sequence is converted as a string in all upper cases
        return str.__new__(self,s.upper())      
    def getN (self):
        #returns the count of value of N in the sequence
        return self.count("N")
    def remove(self):

        print(self.replace("N", "{}".format(coolString.getN())))
#asks the user to input a DNA sequence
dna = input("Enter a dna sequence: ")
#takes the inputted DNA sequence, ???
coolString = dnaString(dna)
coolString.remove()

当我输入AaNNNNNNGTC时，我应该得到AA{6}GTC作为答案，但是当我运行我的代码时，它打印出AA666666GTC，因为我最终用计数替换了每个错误。我如何只输入一次计数？

Answer 1

如果要在没有外部库的情况下完成任务，可以使用以下命令完成：

def fix_dna(dna_str):
    fixed_str = ''
    n_count = 0
    n_found = False
    for i in range(len(dna_str)):
         if dna_str[i].upper() == 'N':
             if not n_found:
                 n_found = True
             n_count += 1
         elif n_found:
             fixed_str += '{' + str(n_count) + '}' + dna_str[i]
             n_found = False
             n_count = 0
         elif not n_found:
             fixed_str += dna_str[i]
    return fixed_str

Answer 2

不是最干净的解决方案，而是完成工作

from itertools import accumulate
s = "AaNNNNNNGTC"
for i in reversed(list(enumerate(accumulate('N'*100, add)))):
    s=s.replace(i[1], '{'+str(i[0] + 1)+'}')
s = 'Aa{6}GTC'

Answer 3

这是预期的，来自documentation：

返回字符串s 的副本，其中所有出现的子字符串old替换为new。

一种解决方案可能是使用正则表达式。 re.sub可以使用生成替换字符串的callable：

import re

def replace_with_count(x):
    return "{%d}" % len(x.group())

test = 'AaNNNNNNGTNNC'

print re.sub('N+', replace_with_count, test)

删除所有出现的字母并替换为多少错误的计数

3 个答案: