具有更好压缩的2个骰子总和列表的编码。 (字节限制)

时间:2017-01-04 23:42:13

标签: algorithm compression

想象一下任何使用两个六面骰子的游戏。

需要存储游戏的历史记录,我们希望存储在整个游戏中掷骰子所产生的总和。

在传统的Huffman enconding中,7具有更大的概率,因此,它以3位编码。 2和12需要5位。 在这种情况下,一个符号以可变代码大小编码。

但是,我试图弄清楚一个字节(8位)编码不同的骰子总和序列。 因此,在这种情况下,代码大小是常量(8位),但符号的数量是可变的。天真的例子:

  • 0x00 = {2}
  • 0x01 = {3}
  • ...
  • 0x0A = {12}
  • 0x0B = {2,2}
  • 0x0C = {2,3}
  • 0x0D = {2,4}等。

因此,解码器可以逐字节读取。因此,每个字节都独立于另一个字节。

如何找到压缩效果更好的映射?

你能指出一些解决这种压缩情况的算法吗?

我对此的看法是:

1个和的序列可以从0x00到0x0A(从2到12)分配。 我可以将序列{7}分成:{7,1},{7,2} ...... {7,12}并为这些序列分配值。

如果我对{7,x}的整个列表执行此操作,那么,我可以从1个和值中删除{7}(因为任何以7开头的序列都可以通过使用2个和序列来到达)。 因此,生成的编码将是:

  • {2} - {6}
  • {8} - {12}
  • {7,2} - {7,12}

然后,例如,我认为:{6,6},{6,7}或{6,8}可以提供比{7,2}或{7,12}更多的“价值”(更大的概率)

但是,如果我删除{7,2}或{7,12},那么我应该将{7}返回到列表中(否则,{7,2}无法表达)。 像这样:

  • {2} - {12}
  • {7,3} - {7,11}
  • {6,6} - {6,8}

所以,在这个问题上应该有某种“权衡”。

2 个答案:

答案 0 :(得分:0)

这里的解决方案我认为每字节的速率大约为7.733629位。 (在Python 3中生成代码,如果你想使用它:https://github.com/eisenstatdavid/huffman/blob/master/huffman.py)我的算法是一些EM是交替的东西(1)计算第一个卷在一个字节中的固定分布(2) )选择最多256个可能的单词,这些单词受制于我们可以编码任何无限序列的约束。我猜想最优性,虽然我只知道这个解决方案是局部最大值(即便如此,假设我的代码没有错误等)。

{{2}, {12}, {7, 7}, {6, 7}, {8, 7}, {5, 7}, {9, 7}, {4, 7}, {10,
7}, {7, 6}, {7, 8}, {6, 6}, {6, 8}, {8, 6}, {8, 8}, {5, 6}, {5, 8},
{9, 6}, {9, 8}, {4, 6}, {4, 8}, {10, 6}, {10, 8}, {7, 5}, {7, 9},
{6, 5}, {6, 9}, {8, 5}, {8, 9}, {5, 5}, {5, 9}, {9, 5}, {9, 9}, {3,
7}, {11, 7}, {4, 5}, {4, 9}, {10, 5}, {10, 9}, {7, 4}, {7, 10}, {3,
6}, {3, 8}, {11, 6}, {11, 8}, {6, 4}, {6, 10}, {8, 4}, {8, 10}, {5,
4}, {5, 10}, {9, 4}, {9, 10}, {4, 4}, {4, 10}, {10, 4}, {10, 10},
{3, 5}, {3, 9}, {11, 5}, {11, 9}, {7, 3}, {7, 11}, {2, 7}, {12, 7},
{6, 3}, {6, 11}, {8, 3}, {8, 11}, {5, 3}, {5, 11}, {9, 3}, {9, 11},
{3, 4}, {3, 10}, {11, 4}, {11, 10}, {4, 3}, {4, 11}, {10, 3}, {10,
11}, {2, 6}, {2, 8}, {12, 6}, {12, 8}, {2, 5}, {2, 9}, {12, 5},
{12, 9}, {3, 3}, {3, 11}, {11, 3}, {11, 11}, {7, 2}, {7, 12}, {7,
7, 7}, {2, 4}, {2, 10}, {12, 4}, {12, 10}, {6, 2}, {6, 12}, {8, 2},
{8, 12}, {6, 7, 7}, {8, 7, 7}, {5, 2}, {5, 12}, {9, 2}, {9, 12},
{5, 7, 7}, {9, 7, 7}, {4, 2}, {4, 12}, {10, 2}, {10, 12}, {4, 7,
7}, {10, 7, 7}, {7, 6, 7}, {7, 7, 6}, {7, 7, 8}, {7, 8, 7}, {6, 6,
7}, {6, 8, 7}, {8, 6, 7}, {8, 8, 7}, {6, 7, 6}, {6, 7, 8}, {8, 7,
6}, {8, 7, 8}, {5, 6, 7}, {5, 7, 6}, {5, 7, 8}, {5, 8, 7}, {9, 6,
7}, {9, 7, 6}, {9, 7, 8}, {9, 8, 7}, {4, 6, 7}, {4, 7, 6}, {4, 7,
8}, {4, 8, 7}, {10, 6, 7}, {10, 7, 6}, {10, 7, 8}, {10, 8, 7}, {7,
6, 6}, {7, 6, 8}, {7, 8, 6}, {7, 8, 8}, {7, 5, 7}, {7, 7, 5}, {7,
7, 9}, {7, 9, 7}, {6, 6, 6}, {6, 6, 8}, {6, 8, 6}, {6, 8, 8}, {8,
6, 6}, {8, 6, 8}, {8, 8, 6}, {8, 8, 8}, {5, 6, 6}, {5, 6, 8}, {5,
8, 6}, {5, 8, 8}, {9, 6, 6}, {9, 6, 8}, {9, 8, 6}, {9, 8, 8}, {2,
3}, {2, 11}, {12, 3}, {12, 11}, {6, 5, 7}, {6, 7, 5}, {6, 7, 9},
{6, 9, 7}, {8, 5, 7}, {8, 7, 5}, {8, 7, 9}, {8, 9, 7}, {5, 5, 7},
{5, 7, 5}, {5, 7, 9}, {5, 9, 7}, {9, 5, 7}, {9, 7, 5}, {9, 7, 9},
{9, 9, 7}, {4, 6, 6}, {4, 6, 8}, {4, 8, 6}, {4, 8, 8}, {10, 6, 6},
{10, 6, 8}, {10, 8, 6}, {10, 8, 8}, {3, 2}, {3, 12}, {11, 2}, {11,
12}, {3, 7, 7}, {11, 7, 7}, {4, 5, 7}, {4, 7, 5}, {4, 7, 9}, {4,
9, 7}, {10, 5, 7}, {10, 7, 5}, {10, 7, 9}, {10, 9, 7}, {7, 5, 6},
{7, 5, 8}, {7, 9, 6}, {7, 9, 8}, {7, 6, 5}, {7, 6, 9}, {7, 8, 5},
{7, 8, 9}, {6, 6, 5}, {6, 6, 9}, {6, 8, 5}, {6, 8, 9}, {8, 6, 5},
{8, 6, 9}, {8, 8, 5}, {8, 8, 9}, {6, 5, 6}, {6, 5, 8}, {6, 9, 6},
{6, 9, 8}, {8, 5, 6}, {8, 5, 8}, {8, 9, 6}, {8, 9, 8}, {5, 5, 6},
{5, 5, 8}, {5, 6, 5}, {5, 6, 9}, {5, 8, 5}, {5, 8, 9}, {5, 9, 6},
{5, 9, 8}, {9, 5, 6}, {9, 5, 8}, {9, 6, 5}, {9, 6, 9}, {9, 8, 5},
{9, 8, 9}, {9, 9, 6}, {9, 9, 8}, {7, 4, 7}, {7, 7, 4}, {7, 7, 10},
{7, 10, 7}}

这是一个更简单,次优的解决方案,它将大约7.438148位的熵打包到一个字节中。 251个码字都是长度为3的序列,以{5,7},{6,6},{6,7},{6,8},{7,5},{7,6},{7开头,7},{7,8},{7,9},{8,6},{8,7},{8,8},{9,7},以及所有长度为2的序列; t从其中一个前缀开始。

是否采用前两个第三卷的图表:

   2  3  4  5  6  7  8  9 10 11 12
 2 -  -  -  -  -  -  -  -  -  -  -
 3 -  -  -  -  -  -  -  -  -  -  -
 4 -  -  -  -  -  -  -  -  -  -  -
 5 -  -  -  -  -  X  -  -  -  -  -
 6 -  -  -  -  X  X  X  -  -  -  -
 7 -  -  -  X  X  X  X  X  -  -  -
 8 -  -  -  -  X  X  X  -  -  -  -
 9 -  -  -  -  -  X  -  -  -  -  -
10 -  -  -  -  -  -  -  -  -  -  -
11 -  -  -  -  -  -  -  -  -  -  -
12 -  -  -  -  -  -  -  -  -  -  -

很难分析编码器可能会或可能不会打包下一卷的解决方案,具体取决于它是什么 - 下次的概率分布会受到影响。

答案 1 :(得分:0)

假设您对总和感兴趣,而不对顺序感兴趣: 可变长度:霍夫曼 {2:01110,3:0110,4:1100,5:000,6:001,7:010,8:100,9:101,10:111,11:1101,12:01111}; 对于固定长度:寻找Tunstall编码