用PHP创建一个霍夫曼树

时间:2013-02-07 13:09:15

标签: php arrays recursion tree huffman-code

我正在尝试使用PHP创建一个Huffman树。

这就是我所拥有的:

<!DOCTYPE html>
<html>
    <body>
        <?php
            $extract = 'Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse elementum imperdiet aliquet. Duis non molestie orci. Ut eget nibh nec augue ultricies porttitor.';
            $characters = count_chars($extract, 1);            
            asort($characters);
            foreach($characters as $character => $occurrence)
            {
               echo 'There';
               if($occurrence > 1)
               {
                   echo ' were ' . $occurrence . ' occurrences of ';
               }
               else
               {
                   echo ' was '. $occurrence . ' occurrence of ';
               }
               echo '"<strong>' . chr($character) . '</strong>" in the extract.<br />';
               $characterFreq[chr($character)] = $occurrence;              
            }
            print_r($characterFreq);
        ?>
    </body>
</html>

哪个输出:

There was 1 occurrence of "S" in the extract.
There was 1 occurrence of "U" in the extract.
There was 1 occurrence of "h" in the extract.
There was 1 occurrence of "L" in the extract.
There was 1 occurrence of "D" in the extract.
There was 1 occurrence of "," in the extract.
There was 1 occurrence of "q" in the extract.
There was 1 occurrence of "b" in the extract.
There were 3 occurrences of "g" in the extract.
There were 4 occurrences of "a" in the extract.
There were 4 occurrences of "." in the extract.
There were 4 occurrences of "d" in the extract.
There were 5 occurrences of "p" in the extract.
There were 6 occurrences of "c" in the extract.
There were 6 occurrences of "l" in the extract.
There were 7 occurrences of "m" in the extract.
There were 8 occurrences of "n" in the extract.
There were 8 occurrences of "r" in the extract.
There were 9 occurrences of "u" in the extract.
There were 9 occurrences of "o" in the extract.
There were 10 occurrences of "s" in the extract.
There were 15 occurrences of "t" in the extract.
There were 17 occurrences of "i" in the extract.
There were 20 occurrences of "e" in the extract.
There were 22 occurrences of " " in the extract.
Array ( [S] => 1 [U] => 1 [h] => 1 [L] => 1 [D] => 1 [,] => 1 [q] => 1 [b] => 1 [g] => 3 [a] => 4 [.] => 4 [d] => 4 [p] => 5 [c] => 6 [l] => 6 [m] => 7 [n] => 8 [r] => 8 [u] => 9 [o] => 9 [s] => 10 [t] => 15 [i] => 17 [e] => 20 [ ] => 22 )

我一直在使用array_slice()array_splice()array_unshift()的混合物,但在递归方面遇到了麻烦。

理想情况下,叶子和分支将由数组索引0和&amp;表示。 1。

关于如何以多维数组形式制作霍夫曼树的任何想法都将不胜感激。

1 个答案:

答案 0 :(得分:0)

这是PHP问题的完整解决方案:

function huffmannEncode($string) {
    $originalString = $string;
    $occurences = array();

    while (isset($string[0])) {
        $occurences[] = array(substr_count($string, $string[0]), $string[0]);
        $string = str_replace($string[0], '', $string);
    }

    sort($occurences);
    while (count($occurences) > 1) {
        $row1 = array_shift($occurences);
        $row2 = array_shift($occurences);
        $occurences[] = array($row1[0] + $row2[0], array($row1, $row2));
        sort($occurences);
    }

    // $dictionary is an array that gets filled with the values with a recursive method
    $dictionary = [];
    fillDictionary($dictionary, is_array($occurences[0][1]) ? $occurences[0][1] : $occurences);

    // Generate the final encoded message
    $encoded = '';
    for($i = 0; $i < strlen($originalString); $i++) {
        $encoded .= $dictionary[$originalString[$i]];
    }
    return $encoded;
}

// This function runs recursively to generate the Huffmann tree
function fillDictionary(&$dictionary, $data, $value = '') {
    if (!is_array($data[0][1])) {
        $dictionary[$data[0][1]] = $value.'0';
    } else {
        fillDictionary($dictionary, $data[0][1], $value.'0');
    }
    if (isset($data[1])) {
        if (!is_array($data[1][1])) {
            $dictionary[$data[1][1]] = $value.'1';
        } else {
            fillDictionary($dictionary, $data[1][1], $value.'1');
        }
    }
}

此函数计算出现次数,并使用递归子函数生成为每个字符分配二进制代码的字典。

以下是一个例子:

// Test the functionality:
echo huffmannEncode('hello world');
// Output: 00100010101101110011110010101111
// And the dictionary:
/* 
[e] => 000
[h] => 001
[r] => 010
[w] => 011
[l] => 10
[o] => 110
[ ] => 1110
[d] => 1111
*/