Sardinas-帕特森算法理解

时间:2015-11-05 22:18:50

标签: algorithm binary data-compression

我试图在这个代码字上应用Sardinas-Patterson算法: C = {0,01,0111,01111,11110}

但我不明白怎么做,

我开始了 0是01的前缀 - >悬挂后缀为1 List = {0,01,0111,01111,11110,1}

0是0111的前缀 - >悬挂后缀是111

List = {0,01,0111,01111,11110,1,111}

0是01111的前缀 - >悬挂后缀是1111

List = {0,01,0111,01111,11110,1,111,1111}

但我不知道如何继续...

tnx很多

1 个答案:

答案 0 :(得分:0)

这是我对algorithm described by wikipedia

的解释

首先,创建一个以两个集合作为输入的函数,并生成一组所有悬空后缀:

SET findSuffixes( SET a, SET b )
{
    SET result = {}
    for each item x in a
        for each item y in b
        {
            if x is a prefix of y 
                result.additem( y - x )
            else if y is a prefix of x
                result.additem( x - y )
        }
    return result
}

创建一个函数,检查给定集中是否出现任何代码字:

BOOL containsCodeword( SET a, SET codewords )

创建一个函数,确定两个集是否相同:

BOOL setIsEqualToSet( SET a, SET b )

然后算法看起来像这样

(A) SET codewords = { 0, 01, 0111, 01111, 11110 }
(B) SET current = findSuffixes( codewords, codewords )
(C) LIST_OF_SETS previous = {}
    forever
    {
(D)     if ( containsCodeword( current, codewords) )
           then the code is not uniquely decodable, done
        for ( each set x in previous )
(E)        if ( setIsEqualToSet( current, x )
              then the code is uniquely decodable, done

(F)     previous.additem( current )
(G)     current = findSuffixes( current, codewords )
    }

这是一个有效的例子:

(A) codewords = { 0, 01, 0111, 01111, 11110 }
(B) findSuffixes( codewords, codewords )
   0     - 0 = ignored since the result is empty
   01    - 0 = 1
   0111  - 0 = 111
   01111 - 0 = 1111
   11110 - 0 = rejected since 11110 doesn't start with 0

   0     - 01 = rejected
   01    - 01 = ignored
   0111  - 01 = 11
   01111 - 01 = 111
   11110 - 01 = rejected

   0     - 0111 = rejected
   01    - 0111 = rejected
   0111  - 0111 = ignored
   01111 - 0111 = 1 
   11110 - 0111 = rejected

   0     - 01111 = rejected
   01    - 01111 = rejected
   0111  - 01111 = rejected
   01111 - 01111 = ignored
   11110 - 01111 = rejected

   0     - 11110 = rejected
   01    - 11110 = rejected
   0111  - 11110 = rejected
   01111 - 11110 = rejected
   11110 - 11110 = ignored

    After removing duplicates, current = { 1, 11, 111, 1111 }

(D) current does not contain any of the codewords
(E) the previous list is empty, so no set in the previous list matches current
(F) at this point the set { 1, 11, 111, 1111 } is added to the previous list

(G) findsuffixes( current, codewords )
   there are 40 cases to consider
   20 cases where a member of the current set is removed from a member of codewords
   20 cases where a member of codewords is removed from a member of the current set
   most cases are rejected, the only interesting cases are

   11110 - 1    = 1110
   11110 - 11   = 110
   11110 - 111  = 10
   11110 - 1111 = 0

    So current = { 1110, 110, 10, 0 }

(D) current does contain one of the codewords, so the code in not uniquely decodable