如何从DNA序列中获取片段

时间:2015-07-17 14:18:06

标签: python bioinformatics genome

我想将DNA基因组切割成任何k-mer大小,所以我创建了Sliding_DNA(dna_list,size_to_split)函数,但是我没有工作。

有人可以帮帮我!

当我打印出变量pedazos时,它给了我以下内容:

'CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC', 'CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC', 'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA', 'TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT', 'TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT']

代码:

def Sliding_DNA(dna_list,size_to_split):

# range por el que va a slide

#vecesRecorrer = int(len(dna_list) / 500)

lista_temp = []


#dna_to_split = dna_list[0]

#print(dna_to_split)

posiInicial = 0

posiFinal = 0

test = 'AGCTTTTCATTCTGACTGCAACGGGCAATATGTCTCTGTGTGGATTAAAAAAAGAGTGTCTGATAGCAGCTTCTGAACTGGTTACCTGCCGTGAGTAAATTAAAATTTTATTGACTTAGGTCACTAAATACTTTAACCAATATAGGCATAGCGCACAGACAGATAAAAATTACAGAGTACACAACATCCATGAAACGCATTAGCACCACCATTACCACCACCATCACCATTACCACAGGTAACGGTGCGGGCTGACGCGTACAGGAAACACAGAAAAAAGCCCGCACCTGACAGTGCGGGCTTTTTTTTCGACCAAAGGTAACGAGGTAACAACCATGCGAGTGTTGAAGTTCGGCGGTACATCAGTGGCAAATGCAGAACGTTTTCTGCGGGTTGCCGATATTCTGGAAAGCAATGCCAGGCAGGGGCAGGTGGCCACCGTCCTCTCTGCCCCCGCCAAAATCACCAACCACCTGGTGGCGATGATTGAAAAAACCATT'

for nucleotide in test:

    pedazo = ""

    posiFinal = posiInicial + size_to_split

    for posiInicial in xrange(posiFinal):

        pedazo += nucleotide

        if len(pedazo)==size_to_split:

            lista_temp.append(pedazo)

    posiInicial += size_to_split


return lista_temp


pedazos = Sliding_DNA(dna_list,100)

1 个答案:

答案 0 :(得分:1)

问题是因为这个,

pedazo += posiInicial

您为pedazo变量分配了空字符串,因此它是一个字符串。 posiInicial变量包含整数。因此python会在字符串和整数上连接或执行+时感到困惑。

因此,请将pedazo的值更改为0

pedazo = 0

cont += 1

posiFinal = posiInicial + 500

for posiInicial in xrange(posiFinal):

    pedazo += posiInicial