比较蛋白质序列时出现“ TypeError:字符串索引必须为整数”

时间:2019-03-24 21:24:39

标签: python list bioinformatics

如何返回字母列表?

我有序列翻译器,还有一个读取dna和蛋白质序列的python代码。该代码读取dna序列并将其翻译为蛋白质序列,读取蛋白质序列,将其与翻译后的蛋白质序列进行比较,并打印出已读取的蛋白质序列中存在的蛋白质序列的列表。我该如何打印两种蛋白质中都存在的蛋白质?

def translate_codon(cod):
    """Translates a codon into an aminoacid using an internal dictionary with the standard genetic code."""
    tc = {"GCT":"A", "GCC":"A", "GCA":"A", "GCG":"A",
          "TGT":"C", "TGC":"C",
          "GAT":"D", "GAC":"D",
          "GAA":"E", "GAG":"E",
          "TTT":"F", "TTC":"F",
          "GGT":"G", "GGC":"G", "GGA":"G", "GGG":"G",
          "CAT":"H", "CAC":"H",
          "ATA":"I", "ATT":"I", "ATC":"I",
          "AAA":"K", "AAG":"K",
          "TTA":"L", "TTG":"L", "CTT":"L", "CTC":"L", "CTA":"L", "CTG":"L",
          "ATG":"M", "AAT":"N", "AAC":"N",
          "CCT":"P", "CCC":"P", "CCA":"P", "CCG":"P",
          "CAA":"Q", "CAG":"Q",
          "CGT":"R", "CGC":"R", "CGA":"R", "CGG":"R", "AGA":"R", "AGG":"R",
          "TCT":"S", "TCC":"S", "TCA":"S", "TCG":"S", "AGT":"S", "AGC":"S",
          "ACT":"T", "ACC":"T", "ACA":"T", "ACG":"T",
          "GTT":"V", "GTC":"V", "GTA":"V", "GTG":"V",
          "TGG":"W",
          "TAT":"Y", "TAC":"Y",
          "TAA":"_", "TAG":"_", "TGA":"_"}
    if cod in tc:
        return tc[cod]
    else:
        return '-1'


def seq_prot(dna_seq, ab):
    seqm = dna_seq.upper()
    prot = ab.upper()
    seq_aa = ''
    for pos in range(0, len(seqm)-2,3):
        cod = seqm[pos:pos+3]
        seq_aa += translate_codon(cod)
    for p in seq_aa:
        if p in prot:
            seq_aa[p] += 1
        else:
            seq_aa = p

    return seq_aa

dna_seq = "ACCCCTGTGACATACCTTTATGTTGCCTCGGCGGATCAGCCCGCGCCCC"
ab = 'TLYPAP'

print("The protein sequence are :",seq_prot(dna_seq, ab))

蛋白质序列为:TYPP

1 个答案:

答案 0 :(得分:0)

您的代码被破坏了,因为它将 actions.append({ 'update': { '_index': self.index_name, '_type': self.index_type, '_id': _id, '_routing': _routing } }) actions.append({'script': { 'source': """if(ctx._source.containsKey('s_error_word')){if(!ctx._source.d_topic.contains(params.error)){ctx._source.d_topic.add(params.error)}}else{ctx._source.d_topic=[params.error]}""", 'lang': 'painless', 'params': { "error": sentence['error_char'] } }, 'upsert': {'s_error_word': sentence['error_char']}}) seq_aa都视为str。让我们添加一个实际的字典来收集结果:

dict

输出

def seq_prot(dna_seq, ab):
    sequence = dna_seq.upper()
    protein = ab.upper()
    matches = {}

    for position in range(0, len(sequence), 3):
        codon = sequence[position: position + 3]
        aa = translate_codon(codon)

        if aa in protein:
            if aa in matches:
                matches[aa] += 1
            else:
                matches[aa] = 1

    return matches

dna_seq = "ACCCCTGTGACATACCTTTATGTTGCCTCGGCGGATCAGCCCGCGCCCC"
ab = 'TLYPAP'

print("The protein sequence matches are :", seq_prot(dna_seq, ab))

您可以在返回的The protein sequence matches are : {'T': 2, 'P': 3, 'Y': 2, 'L': 1, 'A': 3} 上使用.keys()从中提取蛋白质。如果希望字母乘以值,则可以使用乘号(*)作为重复运算符。但是,任何 order 的感觉都已经消失了-我们只是在处理 existence 。如果您想保留订单,我们必须采取其他措施。