我需要在fasta文件中计算dna序列的熵,从基数10000到基数11000 这是我所知道的,但是我需要计算第10,000到第11,000个基数之间序列的熵
from math import log
def logent(x):
if x<=0:
return 0
else:
return -x*log(x)
def entropy(lis):
return sum([logent(elem) for elem in lis])
for i in SeqIO.parse("hsvs.fasta", "fasta"):
lisfreq1=[i.seq.count(base)*1.0/len(i.seq) for base in ["A", "C","G","T"]]
entropy(lisfreq1)
答案 0 :(得分:1)
您的序列只是一个字符串,因此您只需slice,例如
seq_start = 10000
seq_end = 11000 + 1
for i in SeqIO.parse("hsvs.fasta", "fasta"):
sub_seq = i.seq[seq_start:seq_end]
lisfreq1=[sub_seq.count(base)*1.0/len(sub_seq) for base in ["A", "C","G","T"]]