Is GZipStream header reliable across .NET versions?

时间:2015-07-28 15:57:17

标签: c# .net gzip gzipstream

I came to the Q&A Is there a way to know if the byte[] has been compressed by gzipstream? and some author states (and it's true) that GZipStream puts {0x1f, 0x8b, 8, 0, 0, 0, 0, 0, 4, 0} characters as header to know if a byte array is a compressed string.

And my question is, is GZipStream header reliable across .NET versions?

3 个答案:

答案 0 :(得分:3)

It should be reliable, because this header is from the GZip specification and therefore not .NET specific. See here for an explanation of these values.

However, according to the specification, only the two first bytes are actually always the same. The third byte is practically always the same, because currently only one valid value exists. The following bytes might change.

答案 1 :(得分:3)

With any GZip format stream you are guarnateed:

First two bytes: 1f, 8b

Next byte: 00 for store (no compression), 01 for compress algorithm, 02 for pack, 03 for lzf and 08 for deflate. .NET so-far always uses deflate and many situations expect only deflate (only deflate-based gzip is expected by web clients as a transfer or content encoding marked as gzip) so it would be unlikely to change without some sort of option to specify it being added.

The next is the file type, with 00 meaning "probably some sort of text file" Since GZipStream has no information on the file type, it always uses that.

The next four are file-modification time in Unix format. Again, since the class has no information about the file–as it receives a stream, not a file with metadata, these are always set to 0.

The next byte depends on the compression method. With deflate it could be 2 to indicate heavy compression or 4 to indicate light compression.

The next (last in your sequence) depends on the OS type in use. 0 means "FAT Filesystem" but has continued to be used by Windows as Windows has moved to use other file systems like NTFS. It could potentially have a different value if used with Mono on a non-Windows file system, though that situation could also potentially decide to match the .NET behaviour. (Update: At least some versions of Mono will set the file-system flag to something other than 0 on non-Windows systems).

答案 2 :(得分:1)

A gzip stream is assured to start with 0x1f 0x8b 0x08. There is no other compression method supported than the 0x08 in the third byte.

So if you don't see 0x1f 0x8b 0x08, then it's not a gzip stream. However if you do see 0x1f 0x8b 0x08, then it may or may not be a gzip stream. It probably is, but you can't assume that.

What you should do with a candidate gzip file is to simply start decompressing it as such. The decoder will immediately recognize if there is no gzip header, and will furthermore soon detect a problem in the compressed data if there is an accidental gzip header. You shouldn't have to check for the header, since the decoder already does, as well as check for valid compressed data after that.