堆,python,代码生成

时间:2014-04-30 01:32:01

标签: python heap

我想生成一个输出频率和代码字的表格,用于dna字母..我应该得到的输出是:

Symbol: T Codeword:         000 Freq: 10
Symbol: T Codeword:         001 Freq: 15
Symbol: T Codeword:          01 Freq: 25
Symbol: T Codeword:           1 Freq: 50
Average VLC codeword length: 1.75 bits per symbol
Average fixed length codeword length: 2 bits per symbol

然而,对于平均VLC码字长度I得到一个长十进制数,固定码字长度也是如此。加上固定长度应该大于VLC,但我的相反。我认为我正在实现日志代码错误,但我究竟做错了什么?这是代码:

def main():
  dnaData = readFile()
  dataSymbol = symbol(dnaData)
  node = symNode(dataSymbol)
  heap = mkHeap(len(node), compareFunc)
  dataCollectNode(node, heap)

  while heap.size > 1:
      n1 = removeMin(heap)
      n2 = removeMin(heap)

      for element in n1.symbol:
          element.code = ('0' + element.code)

      for element in n2.symbol:
          element.code = ('1' + element.code)

      newNode = mkNode((n1.cumFreq+n2.cumFreq),(n1.symbol + n2.symbol))
      add(heap, newNode)

  print("Variable length code output...")
  print("---------------------------------------")
  total_different_symbols = 0
  heapNode = top(heap)
  for element in heapNode.symbol:
      print("Symbol: %2s " % element.name, end ='')
      print("Codeword: %8s " % element.code, end ='')
      print("Frequency: %5d " % element.freq)
      temp = int(element.freq)*len(element.code)

      total_different_symbols += temp

  total_different_symbols = total_different_symbols / heapNode.cumFreq
  print("Average VLC codeword length: ", total_different_symbols, " bits per symbols")
  average_fixed_length_codeword = log(total_different_symbols)
  print("Average fixed length codeword length: ", average_fixed_length_codeword, " bits per symbol")

任何提示?

1 个答案:

答案 0 :(得分:0)

知道了,我必须将其更改为log(total_different_symbols, 2)