霍夫曼树:穿越

时间:2014-11-24 12:08:10

标签: c# recursion traversal huffman-code

我不确定我是如何攻击穿越我的霍夫曼树的。树是正确的,我只是很难确定如何以一种好的方式遍历它。出于某种原因,我的遍历方法没有结果......

更新:清理代码,使其更加面向对象

节点类:

public class Node
{
    public int frekvens; //Frequency
    public char tegn; //Symbol
    public Node venstre; //Left child
    public Node høyre; //Right child
    public string s; //result string
    public string resultat;
    public Node (char c) // Node constructor containing symbol.
    {
        frekvens = 1;
        tegn = c;
    }

    public Node (int f, Node venstre, Node høyre) // Node Constructor containing frequency and children
        {
            frekvens = f;
            this.venstre = venstre;
            this.høyre = høyre;
        }

    public Node (Node node) // Node constructor containing a node
        {
            frekvens = node.frekvens;
            tegn = node.tegn;
            this.venstre = venstre;
            this.høyre = høyre;
        }

    public void ØkMed1() // Inkrement frequency by one
    {
        frekvens = frekvens + 1;
    }


    public char getVenstreTegn ()
    {
        return venstre.tegn;
    }

    public char getHøyreTegn ()
    {
        return venstre.tegn;
    }

    public int getVenstreFrekvens ()
    {
        return venstre.frekvens;
    }

    public int getHøyreFrekvens ()
    {
        return høyre.frekvens;
    }

    public int getFrekvens()
    {
        return frekvens;
    }


    public bool ErTegn(char c)
    {
        if ( c == tegn)
        {
            return false;
        }
        else
        {
            return true;
        }
    }

    //Pretty sure this does not work as intended
    public string traverser (Node n) //Traverse the tree
    {
        if (n.tegn != '\0') //If the node containes a symbol --> a leaf
        {
            resultat += s;  
        }
        else
        {
            if (n.getVenstreTegn() == '\0') //If left child does not have a symbol
            {
                s += "0";
                traverser(n.venstre);
            }
            if (n.getHøyreTegn() == '\0') //If right child does not have a symbol
            {
                s += "1";
                traverser(n.høyre);
            }
        }
        return resultat;
    }
    public string Resultat() //Used priviously to check if i got the correct huffman tree
    {
        string resultat;
        resultat = "Tegn: " + Convert.ToString(tegn) +"  frekvens: " + Convert.ToString(frekvens) + "\n";
        return resultat;
    }
}

Huffman_Tree课程:

public class Huffman_Tre
{
    string treString;
    List<Node> noder = new List<Node>();
    public Node rot;
    public void bygg (string input)
    {
        bool funnet; //Found
        char karakter; //character

        for (int i = 0; i < input.Length;i++) //Loops through string and sets character
            //with coresponding freqeuncy in the node list
        {   
            karakter = input[i];
            funnet = false; //default
            for (int j = 0; j< noder.Count; j++)
            {
                if (noder[j].ErTegn(karakter) == false) //if the character already exists
                {
                    noder[j].ØkMed1(); //inkrement frequency by one
                    funnet = true; 
                    break;
                }
            }
            if (!funnet) //if the character does not exist 
            {
                noder.Add(new Node(karakter)); //add the character to list
            }
        }
        //Sorting node list acending by frequency
        var sortertListe = noder.OrderBy(c => c.frekvens).ToList();

        noder = sortertListe; 

        do
        {
            noder.Add(new Node((noder[0].frekvens + noder[1].frekvens), noder[0],noder[1]));

            //Remove the leaf nodes
            noder.RemoveAt(0);
            noder.RemoveAt(0); 
        } while(noder.Count >= 2);

    }

    public Node getRot()
    {
        return rot;
    }

    public string visTre()
    {

        foreach (Node node in noder)
        {
            treString += node.Resultat();
        }
        return treString;
    }
    public bool erNull()
    {
        if (noder[0].tegn == '\0')
        {
            return true;
        }
        else
            return false;
    }
}

主程序:

private void btnKomprimer_Click(object sender, System.Windows.RoutedEventArgs e)
    {
        string input; //The string input I want to compress
        input = txtInput.Text; //initialize input to text input
        input = input.ToLower(); 
        txtOutput.Text = "";

        Huffman_Tre tre = new Huffman_Tre();

        tre.bygg(input);

        Node rot = new Node(tre.getRot());

        txtOutput.Text += rot.traverser(rot);
    }
}

1 个答案:

答案 0 :(得分:9)

当我剩下一点时间的时候,我在玩 C#6.0 的同时制作了一张霍夫曼树的例子。它没有进行优化(甚至到目前为止都没有!),但作为一个例子,它可以正常工作。它可以帮助您了解您的挑战和挑战。可能会出现。由于我的英语远比我的斯堪的纳维亚知识好,我使用英语命名,我希望你不要介意。

首先,让我们从保持频率的班级开始。

public sealed class HuffmanFrequencyTable
{       
    #region Properties
    /// <summary>
    /// Holds the characters and their corresponding frequencies
    /// </summary>
    public Dictionary<char, int> FrequencyTable  { get; set; } = new Dictionary<char, int>();

    #endregion

    #region Methods

    /// <summary>
    /// Clears the internal frequency table
    /// </summary>
    public void Clear()
    {
        FrequencyTable?.Clear();            
    }

    /// <summary>
    /// Accepts and parses a new line (string) which is then 
    /// merged with the existing dictionary or frequency table
    /// </summary>
    /// <param name="line">The line to parse</param>
    public void Accept(string line)
    {
        if (!string.IsNullOrEmpty(line))
        {
            line.GroupBy(ch => ch).
                 ToDictionary(g => g.Key, g => g.Count()).
                 ToList().
                 ForEach(x => FrequencyTable[x.Key] = x.Value);
        }
    }

    /// <summary>
    /// Performs a dump of the frequency table, ordering all characters, lowest frequency first.
    /// </summary>
    /// <returns>The frequency table in the format 'character [frequency]'</returns>
    public override string ToString()
    {            
        return FrequencyTable?.PrintFrequencies();
    }
    #endregion
}

请注意,ToString()方法使用能够转储&#39;的扩展方法。使用的词典的内容。扩展位于名为Helpers的静态类中,如下所示:

/// <summary>
/// Extension method that helps to write the contents of a generic Dictionary to a string, ordered by it's values and 
/// printing the key and it's value between brackets.
/// </summary>
/// <typeparam name="TKey">Generic key</typeparam>
/// <typeparam name="TValue">Generic value type</typeparam>
/// <param name="dictionary">The dictionary</param>
/// <exception cref="ArgumentNullException">Throws an argument null exception if the provided dictionary is null</exception>
/// <returns></returns>
public static string PrintFrequencies<TKey, TValue>(this IDictionary<TKey, TValue> dictionary)
{
    if (dictionary == null)
        throw new ArgumentNullException("dictionary");

    var items = from kvp in dictionary
                orderby kvp.Value
                select kvp.Key + " [" + kvp.Value + "]";

    return string.Join(Environment.NewLine, items);
}

现在,有了这个FrequencyTable,我们就可以开始研究如何构建节点了。 Huffman使用二叉树,因此最好生成具有左右子节点的Node类。我也冒昧地在这里执行遍历算法。该课程建立如下:

public sealed class HuffmanNode
{
    #region Properties

    /// <summary>
    /// Holds the left node, if applicable, otherwise null
    /// </summary>
    public HuffmanNode Left { get; set; } = null;

    /// <summary>
    /// Holds the right node, if applicable, otherwise null
    /// </summary>
    public HuffmanNode Right { get; set; } = null;

    /// <summary>
    /// Holds the Character (or null) for this particular node
    /// </summary>
    public char? Character { get; set; } = null;

    /// <summary>
    /// Holds the frequency for this particular node, defaulted to 0
    /// </summary>
    public int Frequency { get; set; } = default(int);

    #endregion

    #region Methods

    /// <summary>
    /// Traverses all nodes recursively returning the binary 
    /// path for the corresponding character that has been found.
    /// </summary>
    /// <param name="character">The character to find</param>
    /// <param name="data">The datapath (containing '1's and '0's)</param>
    /// <returns>The complete binary path for a character within a node</returns>
    public List<bool> Traverse(char? character, List<bool> data)
    {
        //Check the leafs for existing characters
        if (null == Left && null == Right)
        {
            //We're at an endpoint of our 'tree', so return it's data or nothing when the symbol
            //characters do not match
            return (bool)character?.Equals(Character) ? data : null;
        }
        else
        {
            List<bool> left = null;
            List<bool> right = null;

            //TODO: If possible refactor with proper C# 6.0 features
            if (null != Left)
            {
                List<bool> leftPath = new List<bool>(data);                    
                leftPath.Add(false); //Add a '0'
                left = Left.Traverse(character, leftPath); //Recursive traversal for child nodes within this left node.
            }

            if (null != Right)
            {
                List<bool> rightPath = new List<bool>(data);                    
                rightPath.Add(true); //Add a '1'
                right = Right.Traverse(character, rightPath); //Recursive traversal for childnodes within this right node
            }

            return (null != left) ? left : right;
        }
    }        

    #endregion
}

我在HuffmanTree类中使用Node类。从逻辑上讲,树是从节点构建的。相应的HuffmanTree以这种方式编写:

public sealed class HuffmanTree
{
    #region Fields
    /// <summary>
    /// Field for keeping the Huffman nodes in. Internally used.
    /// </summary>
    private List<HuffmanNode> nodes = new List<HuffmanNode>();        
    #endregion

    #region Properties

    /// <summary>
    /// Holds the Huffman tree
    /// </summary>
    public HuffmanNode Root   {  get; set; } = null;

    /// <summary>
    /// Holds the frequency table for all parsed characters
    /// </summary>
    public HuffmanFrequencyTable Frequencies { get; private set; } = new HuffmanFrequencyTable()

    /// <summary>
    /// Holds the amount of bits after encoding the tree.
    /// Primary usable for decoding.
    /// </summary>
    public int BitCountForTree  { get;  private set;  } = default(int);

    #endregion

    #region Methods

    /// <summary>
    /// Builds the Huffman tree
    /// </summary>
    /// <param name="source">The source to build the Hufftree from</param>
    /// <exception cref="ArgumentNullException">Thrown when source is null or empty</exception>
    public void BuildTree(string source)
    {
        nodes.Clear(); //As we build a new tree, first make sure it's clean :)

        if (string.IsNullOrEmpty(source))
            throw new ArgumentNullException("source");
        else
        {
            Frequencies.Accept(source);

            foreach (KeyValuePair<char, int> symbol in Frequencies.FrequencyTable)
            {
                nodes.Add(new HuffmanNode() { Character = symbol.Key, Frequency = symbol.Value });
            }

            while (nodes.Count > 1)
            {
                List<HuffmanNode> orderedNodes = nodes.OrderBy(node => node.Frequency).ToList();

                if (orderedNodes.Count >= 2)
                {
                    List<HuffmanNode> takenNodes = orderedNodes.Take(2).ToList();

                    HuffmanNode parent = new HuffmanNode()
                    {
                        Character = null,
                        Frequency = takenNodes[0].Frequency + takenNodes[1].Frequency,
                        Left = takenNodes[0],
                        Right = takenNodes[1]
                    };

                    //Remove the childnodes from the original node list and add the new parent node
                    nodes.Remove(takenNodes[0]);
                    nodes.Remove(takenNodes[1]);
                    nodes.Add(parent);
                }
            }

            Root = nodes.FirstOrDefault();
        }
    }

    /// <summary>
    /// Encodes a given string to the corresponding huffman encoding path
    /// </summary>
    /// <param name="source">The source to encode</param>
    /// <returns>The binary huffman representation of the source</returns>
    public BitArray Encode(string source)
    {
        if (!string.IsNullOrEmpty(source))
        {
            List<bool> encodedSource = new List<bool>();
            //Traverse the tree for each character in the passed source (string) and add the binary path to the encoded source
            encodedSource.AddRange(source.SelectMany(character =>
                                        Root.Traverse(character, new List<bool>())
                                    ).ToList()
                                  );
            //For decoding, we might need the amount of bits to skip trailing bits.
            BitCountForTree = encodedSource.Count;
            return new BitArray(encodedSource.ToArray());
        }
        else return null;
    }

    /// <summary>
    /// Decodes a given binary path to represent it's string value
    /// </summary>
    /// <param name="bits">BitArray for traversing the tree</param>
    /// <returns></returns>
    public string Decode(BitArray bits)
    {
        HuffmanNode current = Root;
        string decodedString = string.Empty;

        foreach (bool bit in bits)
        {
            //Find the correct current node depending on the bit set or not set.
            current = (bit ? current.Right ?? current : current.Left ?? current);               

            if (current.IsLeaf())
            {
                decodedString += current.Character;
                current = Root;
            }
        }

        return decodedString;
    }

    #endregion
}

这段代码中有趣的是,我决定使用BitArrays来保存树的二进制路径。这里的public BitArray Encode(string source)方法包含脏黑客。我跟踪用于编码的总位数,并将其存储在BitCountForTree属性中。执行解码时,我将使用此属性删除可能出现的任何尾随位。有一种更好的方式来执行此操作,但我会留下那个让你知道的。

此外,该类使用为HuffmanNode编写的扩展方法。虽然这很简单:

  /// <summary>
  /// Determines whether a given Huffman node is a leaf or not.
  /// A node is considered to be a leaf when it has no childnodes
  /// </summary>
  /// <param name="node">A huffman node</param>
  /// <returns>True if no children are left, false otherwise</returns>
  public static bool IsLeaf(this HuffmanNode node)
  {
      return (null == node.Left && null == node.Right);
  }

此扩展方法可以方便地确定给定节点是否实际上是叶节点。叶子是一个没有子节点的节点,因此是二叉树的结尾(或者更好的是树的一个分支)。

现在有趣的部分,我如何在这里工作。我已经构建了一个包含3个文本框的Windows窗体应用程序。一个用于实际输入,一个用于二进制(编码)输出,另一个用于显示压缩结果。 我还放了两个简单的按钮,一个用于执行霍夫曼编码,另一个用于霍夫曼解码。

霍夫曼编码方法编写如下(仅在编码按钮的事件处理程序中):

 string input = tbInput.Text;
 Tree.BuildTree(input); //Build the huffman tree

 BitArray encoded = Tree.Encode(input); //Encode the tree

 //First show the generated binary output
 tbBinaryOutput.Text = string.Join(string.Empty, encoded.Cast<bool>().Select(bit => bit ? "1" : "0"));

  //Next, convert the binary output to the new characterized output string.       
  byte[] bytes = new byte[(encoded.Length / 8) + 1];
  encoded.CopyTo(bytes, 0);

  tbOutput.Text = Encoding.Default.GetString(bytes); //Write the compressed output to the textbox.

请注意,编码的二进制字符串没有任何尾随位。我将其留给C#的编码机制。这样做的缺点是,我必须在解码时跟踪它。

现在解码也不算太难。虽然,对于这个例子,我正在利用上面放置的编码代码生成的压缩输出。另外,我假设已经建立了霍夫曼树(以及它的频率表!!!)。通常,频率表存储在压缩文件中,因此可以重建。

 //First convert the compressed output to a bit array again again and skip trailing bits.            
 bool[] boolAr = new BitArray(Encoding.Default.GetBytes(tbOutput.Text)).Cast<bool>().Take(Tree.BitCountForTree).ToArray();
 BitArray encoded = new BitArray( boolAr );

 string decoded = Tree.Decode(encoded);
 MessageBox.Show(decoded, "Decoded result: ", MessageBoxButtons.OK, MessageBoxIcon.Information);

请注意我创建的脏黑客,因为Encoding.Default.GetBytes(tbOutput.Text)肯定会生成一个字节数组,它可能包含不需要解码的尾随位。因此,我只根据重建树获取实际需要的位数。

因此,在运行时,我的示例提供了以下输出,当使用&#39;世界着名句子&#39; &#34;快速的棕色狐狸跳过懒惰的程序员&#34;:

按下&#34; Huff编码&#34;按钮:

HuffEncode

按下&#34; Huff decode&#34;按钮:

HuffDecode

现在这段代码可以真正使用一些优化,因为您可能会考虑使用Arrays而不是Dictionaries。还有更多,但需要您考虑。