生物学家使用字母A,C,T和G来模拟基因组。基因是基因组的替代物,其在三联体ATG之后开始并且在三联体TAG,TAA或TGA之前结束。此外,基因串的长度是3的倍数,基因不含任何三联体ATG,TAG,TAA和TGA。
理想情况下:
Enter a genome string: TTATGTTTTAAGGATGGGGCGTTAGTT #Enter
TTT
GGGCGT
-----------------
Enter a genome string: TGTGTGTATAT
No Genes Were Found
到目前为止,我有:
def findGene(gene):
final = ""
genep = gene.split("ATG")
for part in genep:
for chr in part:
for i in range(0, len(chr)):
if genePool(chr[i:i + 3]) == 1:
break
else:
final += (chr[i+i + 3] + "\n")
return final
def genePool(part):
g1 = "ATG"
g2 = "TAG"
g3 = "TAA"
g4 = "TGA"
if (part.count(g1) != 0) or (part.count(g2) != 0) or (part.count(g3) != 0) or (part.count(g4) != 0):
return 1
def main():
geneinput = input("Enter a genome string: ")
print(findGene(geneinput))
main()
# TTATGTTTTAAGGATGGGGCGTTAGTT
我一直遇到错误
说实话,这对我来说真的不起作用 - 我认为这些代码行已经走到了尽头 - 一种新方法可能会有所帮助。
提前致谢!
我遇到的错误 -
Enter a genome string: TTATGTTTTAAGGATGGGGCGTTAGTT
Traceback (most recent call last):
File "D:\Python\Chapter 8\Bioinformatics.py", line 40, in <module>
main()
File "D:\Python\Chapter 8\Bioinformatics.py", line 38, in main
print(findGene(geneinput))
File "D:\Python\Chapter 8\Bioinformatics.py", line 25, in findGene
final += (chr[i+i + 3] + "\n")
IndexError: string index out of range
就像我之前说过的那样,我不确定我是否正在使用我当前的代码来解决问题 - 任何有伪代码的新想法都会受到赞赏!
答案 0 :(得分:3)
可以使用regular expression:
来完成import re
pattern = re.compile(r'ATG((?:[ACTG]{3})+?)(?:TAG|TAA|TGA)')
pattern.findall('TTATGTTTTAAGGATGGGGCGTTAGTT')
pattern.findall('TGTGTGTATAT')
<强>输出强>
['TTT', 'GGGCGT'] []
从https://regex101.com/r/yI4tN9/3
中提取的解释"ATG((?:[ACTG]{3})+?)(?:TAG|TAA|TGA)"g
ATG matches the characters ATG literally (case sensitive)
1st Capturing group ((?:[ACTG]{3})+?)
(?:[ACTG]{3})+? Non-capturing group
Quantifier: +? Between one and unlimited times, as few times as possible, expanding as needed [lazy]
[ACTG]{3} match a single character present in the list below
Quantifier: {3} Exactly 3 times
ACTG a single character in the list ACTG literally (case sensitive)
(?:TAG|TAA|TGA) Non-capturing group
1st Alternative: TAG
TAG matches the characters TAG literally (case sensitive)
2nd Alternative: TAA
TAA matches the characters TAA literally (case sensitive)
3rd Alternative: TGA
TGA matches the characters TGA literally (case sensitive)
g modifier: global. All matches (don't return on first match)