所以我正在尝试编写一个可用于分析DNA的程序,现在我正试图将这些基因分成“链”。为了实现这一点,我需要分析链并将其分成三个STOP密码子之一(三个碱基对的组)。我现在的代码是这样的:
class Strand:
def __init__(self, code):
self.code = [code]
self.endCodons = []
self.genes = []
def getGenes(self):
for codon in self.endCodons:
for code in self.code:
code = code.split(codon)
strand = Strand("ATCATGCACATAGAAACTGATACACACCACAGTGATCACATGAAGTACACATG")
strand.getGenes()
print(strand.genes)
但是,当我运行它时,它会返回一个空列表。 我可以使用一些建议。
答案 0 :(得分:1)
通过每个终止密码子运行循环并按此分裂将导致错误的输出,因为我认为这些终止密码子可以以序列中的任何顺序出现,并且对终止密码子列表的迭代将要求停止在那里同样的顺序。
所以,如果我理解正确,你需要从左到右扫描你的字符串,然后搜索密码子:
class Strand:
def __init__(self, code):
self.code = code
self.endCodons = ["TAG", "TAA", "TGA"]
self.genes = []
def getGenes(self):
if (len(self.code) % 3 != 0):
print("Input sequence is not divisible by 3?")
# In this, we assume each stop codon is always 3 characters.
iteration = 0
lastGeneEnd = 0
while (iteration < len(self.code)):
# What is our current 3 character sequence? (Unless it's at the end)
currentSequence = self.code[iteration:iteration + 3]
# Check if our current 3 character sequence is an end codon
if (currentSequence in self.endCodons):
# What will our gene length be?
geneLength = (iteration + 3) - lastGeneEnd
# Make sure we only break into multiples of 3
overlap = 3 - (geneLength % 3)
# There is no overlap if our length is already a multiple of 3
if (overlap == 3): overlap = 0
# Modify the gene length to reflect our overlap into a multiple of 3
geneLength = geneLength + overlap
# Update the iteration so we don't process any more than we need
iteration = iteration + overlap + 3
# Grab the entire gene sequence, including the stop codon
gene = self.code[lastGeneEnd:iteration]
# If we have a 3-length gene and there's nothing left, just append to the last gene retrieved as it has
# got to be part of the last sequence
if (len(gene) == 3 and iteration >= len(self.code)):
lastIndex = len(self.genes) - 1
self.genes[lastIndex] = self.genes[lastIndex] + gene
break
# Make sure we update the last end index so we don't include portions of previous positives
lastGeneEnd = iteration
# Append the result to our genes and continue
self.genes.append(gene)
continue
iteration = iteration + 1
strand = Strand("ATCATGCACATAGAAACTGATACACACCACAGTGATCACATGAAGTACACATG")
strand.getGenes()
print("Got Genes: ")
print(strand.genes)
for gene in strand.genes:
print("Sequence '%s' is a multiple of 3: %u" % (gene, len(gene) % 3 == 0))
我不是真正的生物学家,所以我可能做了一些不正确的假设。
<强>编辑:强>
代码保证打破三倍的数量,但我似乎仍然不太理解所需的逻辑。它在给定的示例中有效,但我不确定它是否在其他情况下正常工作。