霍夫曼的编码和解码

时间:2013-09-06 14:47:19

标签: huffman-code

我必须基于 Huffman 算法构建压缩器。 到目前为止,我设法创建了具有每个字符频率的树,并为每个字符生成一个具有较少位数的表示。

这样的话,对于“好这个糖果”这个短语:

  

'o' 000, '' 001, 't' 0100, 'r' 0101, 'p' 0110, 'm' 0111, 'l' 1000, 'i' 1001, 'h' 1010, 'd' 1011, 'a'1100, 'u' 1101, 'g' 1110, 's' 1111

我现在遇到的问题是找到一种方法将树保存在存档中,这样我就可以重建它然后解压缩文件。

有什么建议吗?

我做了一些研究,但发现很难理解,所以如果你能详细解释,我会很感激。


我用来读取文件频率的代码是:

int main (int argc, char *argv[])
{
    int i;

    TipoSentinela *sentinela;

    TipoLista *no = NULL;

    Arv *arvore, *arvore2, *arvore3;

    int *repete = (int *) calloc (256, sizeof(int));

    if(argc == 2)
    {
    in = load_base(argv[1]);
    le_dados_arquivo (repete); //read the frequencies from the file
    sentinela = cria_lista (); //create a marker for the tree node list

    for (i = 0; i < 256; i++)
    {   
        if(repete[i] > 0 && i != 0) 
        {
            arvore = arv_cria (Cria_info (i, repete[i])); //create a tree node with the character i and the frequence of it in the file
            no = inicia_lista (arvore, no, sentinela); //create the list of tree nodes
        }
    }

    Ordena (sentinela); //sort the tree nodes list by the frequencies

    for(Seta_primeiro(sentinela); Tamanho_lista(sentinela) != 1; Move_marcador(sentinela))
    {
        Seta_primeiro(sentinela); //put the marker in the first element of the list
        no = Retorna_marcador(sentinela);
        arvore2 = Retorna_arvore (no); //return the tree represented by the list marker
        Move_marcador(sentinela); //put the marker to the next element
        arvore3 = Retorna_arvore (Retorna_marcador (sentinela)); //return the tree represented by the list marker
        arvore = Cria_pai (arvore2, arvore3); //create a tree node that will contain the both arvore2 and arvore3
        Insere_arvoreFinal (sentinela, arvore); //insert the node at the end of the list
        Remove_arvore (sentinela); //remove the node arvore2 from the list
        Remove_arvore (sentinela); //remove the node arvore3 from the lsit
        Ordena (sentinela); //sort the list again

    }

    out = load_out(argv[1]); //open the output file

    Codificacao (arvore); //generate the code from each node of the tree

    rewind(in);

    char c;

    while(!feof(in))
    {
        c = fgetc(in);

        if(c != EOF)
            arvore2 = Procura_info (arvore, c); //search the character c in the tree

        if(arvore2 != NULL)
            imprimebit(Retorna_codigo(arvore2), out); //write the code in the file
    }

    fclose(in);
    fclose(out);
    free(repete);
    arvore = arv_libera (arvore);
    Libera_Lista(sentinela);
    }

    return 0;
}

//bit_counter and cur_byte are global variables
void write_bit (unsigned char bit, FILE *f)
{
    static k = 0;
    if(k != 0)
    {
        if(++bit_counter == 8)
        {
            fwrite(&cur_byte,1,1,f);
            bit_counter = 0;
            cur_byte = 0;
        }
    }

    k = 1;

    cur_byte <<= 1;
    cur_byte |= ('0' != bit);
}

//aux is the code of a character in the tree
void imprimebit(char *aux, FILE *f)
{
    int i, j;

    if(aux == NULL)
        return;

    for(i = 0; i < strlen(aux); i++)
    {
        write_bit(aux[i], f); //write the bits of the code in the file
    }
}

有了这个,我可以在输出文件中编写所有字符的代码,但是我也看不到存储树的方法。

2 个答案:

答案 0 :(得分:1)

您无需发送树。只需发送长度。然后建立一致的算法将长度转换为两端的代码。一致性称为“规范”霍夫曼代码。您按长度对代码进行排序,并在每个长度内按符号排序。然后分配从0开始的代码。所以你最终会得到(_表示空格):

_ 000
o 001
a 0100
d 0101
g 0110
h 0111
i 1000
l 1001
m 1010
p 1011
r 1100
s 1101
t 1110
u 1111

答案 1 :(得分:0)

我确实找到了存储每个角色代码的方法。 例如: 我写树,从根开始向左下,然后向右。 所以,如果我的树就像

那样
         0
      /    \
    0        1
 /    \     /   \
'a'   'b' 'c'   'd'

我的文件的标题会像这样: 001 [8位来自'a'] 1 [8位来自b] 01 [8位来自c] 1 [8位来自d]

有了这个,我就可以重建我的树了。

我现在的问题是逐个读取文件头,要知道我必须创建一个新节点。