下面你可以找到沿给定DNA序列进行内含子检测的简单脚本
from re import match
f= open('data.txt', 'r').readlines()[1::2]
u = [element[:-1] for element in f]
chain=u[0]
exons=[]
#detect introns in chains
for i in u:
for j in range(len(chain)):
if i==chain: #pass sequence itself
pass
else:
if match(i, chain[j::]):
print "Intron %s has been detected in %d position" % (i, j)
#Statement for place selection consisted of chain without current intron to the exons list
其中输入data.txt由上层实例中的链和内部子字符串组成,作为进一步的实例
>Rosalind_10 # chain
ATGGTCTACATAGCTGACAAACAGCACGTAGCAATCGGTCGAATCTCGAGAGGCATATGGTCACATGATCGGTCGAGCGTGTTTCAAAGTTTGCGCCTAG
>Rosalind_12 #intron
ATCGGTCGAA
>Rosalind_15 #intron
ATCGGTCGAGCGTGT
现在我正在寻找切片语句的最佳定义,它将在每个循环中仅选择外显子序列(在找到的位置中删除内含子的链)并将其放置到外显子列表中。如何轻松完成?
感谢您的帮助,
格列勃
答案 0 :(得分:0)
为什么不简单地将链存储为字符串并使用str.split(intron)
这种方式,您不需要搜索位置,以后可以加入剩余的连接序列。
我会做那样的事情,请注意我简化了你使用的一些表达式:
#This way of opening a document is better because you don't need the close() as in your code
# (which as you did it is easily forget)
with open('input.txt', 'r') as f:
data = f.readlines()[1::2]
u = [element[:-1] for element in data]
chain = str(u[0]) #store as a string the chain
exons = []
for intron in data:
if intron == chain:
continue
else:
exon = chain.split(intron) # Find the introns and delete them
exon= "".join(gene) # to get again a single chain
exons.append(exon)
请注意,此代码将作为内含行“> Rosalind_12”,因此您应该检查它(考虑语句ifelse
)