内含子从给定序列中删除

时间:2014-04-12 07:55:25

标签: python bioinformatics

下面你可以找到沿给定DNA序列进行内含子检测的简单脚本

from re import match

f= open('data.txt', 'r').readlines()[1::2]
u = [element[:-1] for element in f]
chain=u[0]
exons=[]

#detect introns in chains
for i in u:
    for j in range(len(chain)):
        if i==chain: #pass sequence itself
            pass
        else:
            if match(i, chain[j::]):
                print "Intron %s has been detected in %d position" % (i, j)
                #Statement for place selection consisted of chain without current intron to the exons list 

其中输入data.txt由上层实例中的链和内部子字符串组成,作为进一步的实例

>Rosalind_10 # chain
ATGGTCTACATAGCTGACAAACAGCACGTAGCAATCGGTCGAATCTCGAGAGGCATATGGTCACATGATCGGTCGAGCGTGTTTCAAAGTTTGCGCCTAG
>Rosalind_12 #intron
ATCGGTCGAA
>Rosalind_15 #intron
ATCGGTCGAGCGTGT

现在我正在寻找切片语句的最佳定义,它将在每个循环中仅选择外显子序列(在找到的位置中删除内含子的链)并将其放置到外显子列表中。如何轻松完成?

感谢您的帮助,

格列勃

1 个答案:

答案 0 :(得分:0)

为什么不简单地将链存储为字符串并使用str.split(intron)这种方式,您不需要搜索位置,以后可以加入剩余的连接序列。

我会做那样的事情,请注意我简化了你使用的一些表达式:

#This way of opening a document is better because you don't need the close() as in your code
# (which as you did it is easily forget)
with open('input.txt', 'r') as f:
    data = f.readlines()[1::2] 
u = [element[:-1] for element in data]
chain = str(u[0]) #store as a string the chain 
exons = []

for intron in data:
    if intron == chain:
        continue
    else:
        exon = chain.split(intron) # Find the introns and delete them 
        exon= "".join(gene) # to get again a single chain
        exons.append(exon)

请注意,此代码将作为内含行“> Rosalind_12”,因此您应该检查它(考虑语句ifelse