如何计算fasta文件中dna序列的熵

时间:2016-06-19 17:22:54

标签: biopython

我需要在fasta文件中计算dna序列的熵,从基数10000到基数11000 这是我所知道的,但是我需要计算第10,000到第11,000个基数之间序列的熵

from math import log  

def logent(x):  
    if x<=0:     
        return 0  
    else:  
        return -x*log(x)  

def entropy(lis):   
    return sum([logent(elem) for elem in lis])

for i in SeqIO.parse("hsvs.fasta", "fasta"):
    lisfreq1=[i.seq.count(base)*1.0/len(i.seq) for base in ["A", "C","G","T"]]

entropy(lisfreq1)

1 个答案:

答案 0 :(得分:1)

您的序列只是一个字符串,因此您只需slice,例如

seq_start = 10000
seq_end = 11000 + 1
for i in SeqIO.parse("hsvs.fasta", "fasta"):
    sub_seq = i.seq[seq_start:seq_end]
    lisfreq1=[sub_seq.count(base)*1.0/len(sub_seq) for base in ["A", "C","G","T"]]