Question

这将会很长，但我不知道如何有效地解释这一点。

所以我正在阅读2个文件。第一个文件有一个字符列表。第二个文件是一个包含3个字符的列表，然后是匹配的标识符字符（由选项卡分隔）。

使用第二个文件，我创建了一个字典，其中3个字符作为项目，一个字符作为相应的键。我需要做的是从第一个列表中一次取3个字符，并将其与字典进行比较。如果匹配，我需要取相应的密钥并将其附加到我将打印出来的新列表中。如果匹配是'*'字符，我需要停止不继续将列表与字典进行比较。

我在比较时遇到问题，然后使用追加功能制作新列表。

以下是第一个输入文件的一部分：

Seq0
ATGGAAGCGAGGATGtGa

以下是第二部分：

AUU     I
AUC     I
AUA     I
CUU     L
GUU     V
UGA     *

到目前为止，这是我的代码：

input = open("input.fasta", "r")
codons = open("codons.txt", "r")

counts = 1
amino_acids = {}

for lines in codons:
        lines = lines.strip()
        codon, acid = lines.split("\t")
        amino_acids[codon] = acid
        counts += 1

count = 1

for line in input:
        if count%2 == 0:
                line = line.upper()
                line = line.strip()
                line = line.replace(" ", "")
                line = line.replace("T", "U")

                import re

                if not re.match("^[AUCG]*$", line):
                        print "Error!"

                if re.match("^[AUCG]*$", line):
                        mrna = len(line)/3
                        first = 0
                        last = 3

                        while mrna != 0:
                                codon = line[first:last]
                                first += 3
                                last += 3
                                mrna -= 1
                                list = []

                                if codon == amino_acids[codon]:
                                        list.append(acid)

                                        if acid == "*":
                                                mrna = 0

                                for acid in list:
                                        print acid

所以我希望我的输出看起来像这样：

M    L    I    V    *

但我还没有接近这一点。请帮忙！

Answer 1

以下是纯粹未经测试的代码。检查缩进，语法和逻辑，但应该更接近你想要的。

import re

codons = open("codons.txt", "r")
amino_acids = {}
for lines in codons:
        lines = lines.strip()
        codon, acid = lines.split("\t")
        amino_acids[codon] = acid

input = open("input.fasta", "r")
count = 0
list = []
for line in input:
    count += 1
    if count%2 == 0:    #i.e. only care about even lines
        line = line.upper()
        line = line.strip()
         line = line.replace(" ", "")
         line = line.replace("T", "U")

        if not re.match("^[AUCG]*$", line):
                print "Error!"
        else:
            mrna = len(line)/3
              first = 0
              while mrna != 0:
                  codon = line[first:first+3]
                  first += 3
                  mrna -= 1
                  if codon in amino_acids:
                      list.append(amino_acids[codon])
                      if acid == "*":
                          mrna = 0

for acid in list:
    print acid

Answer 2

在Python中，通常有一种方法可以避免使用计数器等编写显式循环。有一个非常强大的列表理解语法，可以让你在一行中构建列表。也就是说，这是另一种编写第二个for循环的方法：

import re

def codons_to_acids(amino_acids, sequence):
    sequence = sequence.upper().strip().replace(' ', '').replace('T', 'U')
    codons   = re.findall(r'...', sequence)
    acids    = [amino_acids.get(codon) for codon in codons if codon in amino_acids]

    if '*' in acids:
        acids = acids[:acids.index('*') + 1]

    return acids

第一行执行所有字符串清理。将不同的方法链接在一起使代码对我来说更具可读性。你可能喜欢也可能不喜欢。第二行使用re.findall以一种棘手的方式将字符串每三个字符拆分一次。第三行是列表推导，它查找amino_acids dict中的每个密码子并创建结果值的列表。

在列表解析中没有简单的方法可以打破for循环，因此最终if语句会删除*之后发生的任何条目。

您可以这样调用此函数：

amino_acids = {
    'AUU': 'I', 'AUC': 'I', 'AUA': 'I', 'CUU': 'L', 'GUU': 'V', 'UGA': '*'
}

print codons_to_acids(amino_acids, 'ATGGAAGCGAGGATGtGaATT')

Answer 3

如果没有正则表达式可以解决问题，最好不要使用它。

with open('input.fasta', 'r') as f1:
    input = f1.read()

codons = list()
with open('codons.txt', 'r') as f2:
    codons = f2.readlines()

input = [x.replace('T', 'U') for x in input.upper() if x in 'ATCG']
chunks = [''.join(input[x:x+3]) for x in xrange(0, len(input), 3)]

codons = [c.replace('\n', '').upper() for c in codons if c != '\n']

my_dict = {q.split()[0]: q.split()[1] for q in codons }

result = list()

for ch in chunks:
    new_elem = my_dict.pop(ch, None)
    if new_elem is None:
        print 'Invalid key!'
    else:
        result.append(new_elem)
        if new_elem == '*':
            break

print result

Python：从字典中附加到列表

3 个答案: