我必须基于 Huffman 算法构建压缩器。 到目前为止,我设法创建了具有每个字符频率的树,并为每个字符生成一个具有较少位数的表示。
这样的话,对于“好这个糖果”这个短语:
'o' 000, '' 001, 't' 0100, 'r' 0101, 'p' 0110, 'm' 0111, 'l' 1000, 'i' 1001, 'h' 1010, 'd' 1011, 'a'1100, 'u' 1101, 'g' 1110, 's' 1111
我现在遇到的问题是找到一种方法将树保存在存档中,这样我就可以重建它然后解压缩文件。
有什么建议吗?
我做了一些研究,但发现很难理解,所以如果你能详细解释,我会很感激。
我用来读取文件频率的代码是:
int main (int argc, char *argv[])
{
int i;
TipoSentinela *sentinela;
TipoLista *no = NULL;
Arv *arvore, *arvore2, *arvore3;
int *repete = (int *) calloc (256, sizeof(int));
if(argc == 2)
{
in = load_base(argv[1]);
le_dados_arquivo (repete); //read the frequencies from the file
sentinela = cria_lista (); //create a marker for the tree node list
for (i = 0; i < 256; i++)
{
if(repete[i] > 0 && i != 0)
{
arvore = arv_cria (Cria_info (i, repete[i])); //create a tree node with the character i and the frequence of it in the file
no = inicia_lista (arvore, no, sentinela); //create the list of tree nodes
}
}
Ordena (sentinela); //sort the tree nodes list by the frequencies
for(Seta_primeiro(sentinela); Tamanho_lista(sentinela) != 1; Move_marcador(sentinela))
{
Seta_primeiro(sentinela); //put the marker in the first element of the list
no = Retorna_marcador(sentinela);
arvore2 = Retorna_arvore (no); //return the tree represented by the list marker
Move_marcador(sentinela); //put the marker to the next element
arvore3 = Retorna_arvore (Retorna_marcador (sentinela)); //return the tree represented by the list marker
arvore = Cria_pai (arvore2, arvore3); //create a tree node that will contain the both arvore2 and arvore3
Insere_arvoreFinal (sentinela, arvore); //insert the node at the end of the list
Remove_arvore (sentinela); //remove the node arvore2 from the list
Remove_arvore (sentinela); //remove the node arvore3 from the lsit
Ordena (sentinela); //sort the list again
}
out = load_out(argv[1]); //open the output file
Codificacao (arvore); //generate the code from each node of the tree
rewind(in);
char c;
while(!feof(in))
{
c = fgetc(in);
if(c != EOF)
arvore2 = Procura_info (arvore, c); //search the character c in the tree
if(arvore2 != NULL)
imprimebit(Retorna_codigo(arvore2), out); //write the code in the file
}
fclose(in);
fclose(out);
free(repete);
arvore = arv_libera (arvore);
Libera_Lista(sentinela);
}
return 0;
}
//bit_counter and cur_byte are global variables
void write_bit (unsigned char bit, FILE *f)
{
static k = 0;
if(k != 0)
{
if(++bit_counter == 8)
{
fwrite(&cur_byte,1,1,f);
bit_counter = 0;
cur_byte = 0;
}
}
k = 1;
cur_byte <<= 1;
cur_byte |= ('0' != bit);
}
//aux is the code of a character in the tree
void imprimebit(char *aux, FILE *f)
{
int i, j;
if(aux == NULL)
return;
for(i = 0; i < strlen(aux); i++)
{
write_bit(aux[i], f); //write the bits of the code in the file
}
}
有了这个,我可以在输出文件中编写所有字符的代码,但是我也看不到存储树的方法。
答案 0 :(得分:1)
您无需发送树。只需发送长度。然后建立一致的算法将长度转换为两端的代码。一致性称为“规范”霍夫曼代码。您按长度对代码进行排序,并在每个长度内按符号排序。然后分配从0开始的代码。所以你最终会得到(_表示空格):
_ 000
o 001
a 0100
d 0101
g 0110
h 0111
i 1000
l 1001
m 1010
p 1011
r 1100
s 1101
t 1110
u 1111
答案 1 :(得分:0)
我确实找到了存储每个角色代码的方法。 例如: 我写树,从根开始向左下,然后向右。 所以,如果我的树就像
那样 0
/ \
0 1
/ \ / \
'a' 'b' 'c' 'd'
我的文件的标题会像这样: 001 [8位来自'a'] 1 [8位来自b] 01 [8位来自c] 1 [8位来自d]
有了这个,我就可以重建我的树了。
我现在的问题是逐个读取文件头,要知道我必须创建一个新节点。