Question

这是关于bzip2 archive format的问题。任何Bzip2存档都包含文件头，一个或多个块和尾部结构。所有块应以“1AY＆amp; SY”开头，6个字节的PCD编码数字，即0x314159265359。根据{{3}}：

/*--
  A 6-byte block header, the value chosen arbitrarily
  as 0x314159265359 :-).  A 32 bit value does not really
  give a strong enough guarantee that the value will not
  appear by chance in the compressed datastream.  Worst-case
  probability of this event, for a 900k block, is about
  2.0e-3 for 32 bits, 1.0e-5 for 40 bits and 4.0e-8 for 48 bits.
  For a compressed file of size 100Gb -- about 100000 blocks --
  only a 48-bit marker will do.  NB: normal compression/
  decompression do *not* rely on these statistical properties.
  They are only important when trying to recover blocks from
  damaged files.
--*/

问题是：是否真的，所有bzip2存档都将具有开始与字节边界对齐的块？我的意思是通过bzip2的参考实现创建的所有档案，即bzip2-1.0.5 +实用程序。

我认为bzip2可能不会将流解析为字节流，而是解析为比特流（块本身由huffman编码，其设计不是字节对齐的）。

所以，换句话说：如果grep -c 1AY&SY更大（霍夫曼可能会在块内产生1AY＆amp; SY）或等于文件中bzip2块的数量？

Answer 1

BZIP2查看比特流。

来自http://blastedbio.blogspot.com/2011/11/random-access-to-bzip2.html：

无论如何，重要的是BZIP2文件包含一个或多个＆＃34; streams＆＃34;，它们是字节对齐的，每个包含一个（零？）或更多＆＃34;块＆＃34;，它们不是字节对齐的，后面是流的结束标记（六个字节0x177245385090，它是pi的平方根）二进制编码的十进制（BCD），四字节校验和，以及空位字节对齐）。

bzip2 wikipedia文章也提到了比特块对齐（参见文件格式部分），这似乎是我从学校记得的内容（必须实现算法......）。

Bzip2块头：1AY＆amp; SY

1 个答案: