我试图找到所有可能的最小核苷酸长度的阅读框。
"A[TU]G(?:(...){3}){%d,}?(?:[TU]AG|[TU]AA|[TU]GA)" % (minimal_aa)
这几乎可以满足我的需求,但出于某种原因,一些阅读框架并不承认某些终止密码子。
我确定它与(...)
部分有关。我怎么告诉它总是停在[TU]AG|[TU]AA|[TU]GA
,虽然通过多个起始密码子是好的。
我在Eclipse上使用Python。
我正在使用Pythex.org检查我的字符串,但这里是我正在谈论的样本:
AUGGAGAGCCUUGUUCUUGGUGUCAACGAGAAAACACACGUCCAACUCAGUUUGCCUGUCCUUCAGGUUAGAGACGUGCUAGUGCGUGGCUUCGGGGACUCUGUGGAAGAGGCCCUAUCGGAGGCACGUGAACACCUCAAAAAUGGCACUUGUGGUCUAGUAGAGCUGGAAAAAGGCGUACUGCCCCAGCUUGAACAGCCCUAUGUGUUCAUUAAACGUUCUGAUGCCUUAAGCACCAAUCACGGCCACAAGGUCGUUGAGCUGGUUGCAGAAAUGGACGGCAUUCAGUACGGUCGUAGCGGUAUAACACUGGGAGUACUCGUGCCACAUGUGGGCGAAACCCCAAUUGCAUACCGCAAUGUUCUUCUUCGUAAGAACGGUAAUAAGGGAGCCGGUGGUCAUAGCUAUGGCAUCGAUCUAAAGUCUUAUGACUUAGGUGACGAGCUUGGCACUGAUCCCAUUGAAGAUUAUGAACAAAACUGGAACACUAAGCAUGGCAGUGGUGCACUCCGUGAACUCACUCGUGAGCUCAAUGGAGGUGCAGUCACUCGCUAUGUCGACAACAAUUUCUGUGGCCCAGAUGGGUACCCUCUUGAUUGCAUCAAAGAUUUUCUCGCACGCGCGGGCAAGUCAAUGUGCACUCUUUCCGAACAACUUGAUUACAUCGAGUCGAAGAGAGGUGUCUACUGCUGCCGUGACCAUGAGCAUGAAAUUGCCUGGUUCACUGAGCGCUCUGAUAAGAGCUACGAGCACCAGACACCCUUCGAAAUUAAGAGUGCCAAGAAAUUUGACACUUUCAAAGGGGAAUGCCCAAAGUUUGUGUUUCCUCUUAACUCAAAAGUCAAAGUCAUUCAACCACGUGUUGAAAAGAAAAAGACUGAGGGUUUCAUGGGGCGUAUACGCUCUGUGUACCCUGUUGCAUCUCCACAGGAGUGUAACAAUAUGCACUUGUCUACCUUGAUGAAAUGUAAUCAUUGCGAUGAAGUUUCAUGGCAGA CGUGCGACUUUCUGAAAGCCACUUGUGAACAUUGUGGCACUGAAAAUUUAGUUAUUGAAGGACCUACUACAUGUGGGUACCUACCUACUAAUGCUGUAGUGAAAAUGCCAUGUCCUGCCUGUCAAGACCCAGAGAUUGGACCUGAGCAUAGUGUUGCAGAUUAUCACAACCACUCAAACAUUGAAACUCGACUCCGCAAGGGAGGUAGGACUAGAUGUUUUGGAGGCUGUGUGUUUGCCUAUGUUGGCUGCUAUAAUAAGCGUGCCUACUGGGUUCCUCGUGCUAGUGCUGAUAUUGGCUCAGGCCAUACUGGCAUUAA
等待。这是一个糟糕的例子。因为它现在实际上检查了我的眼球输出。我不得不缩短它,但有一个代码,几千个核苷酸,充满了终止密码子,没有任何工作正常。我希望你明白我的意思,如果不是不担心的话。
先谢谢amigos!
答案 0 :(得分:1)
尝试使用此模式查找所有小的并最终重叠的序列:
(?=A[TU]G((?:.{3})+?)[TU](?:AG|AA|GA))
您可以在捕获组1中找到没有起始和终止密码子的每个序列。