Question

希望您已经听说neat hack允许您将JPG和Zip文件合并到一个文件中，并且它是两种格式的有效（或至少可读）文件。好吧，我意识到，因为JPG最后会让任意东西和开头的ZIP，你可以在那里再添加一种格式 - 在中间。出于这个问题的目的，假设中间数据是保证不与JPG或ZIP格式冲突的任意二进制数据（意味着它不包含魔术zip头0x04034b50）。插图：

0xFFD8 <- start jpg data end -> 0xFFD9 ... ARBITRARY BINARY DATA ... 0x04034b50 <- start zip file ... EOF

我这样做：

cat“mss_1600.jpg”filea fileb filea fileb filea fileb filea fileb filea fileb filea fileb filea fileb filea fileb filea fileb filea fileb filea fileb filea fileb filea fileb“null.bytes” “randomzipfile.zip”＆gt; temp.zip

这会生成一个6,318 KB的文件。在<7-Zip中不打开。然而，当我减少一个'双'时（因此而不是13 filea和b，12）：

cat“mss_1600.jpg”filea fileb filea fileb filea fileb filea fileb filea fileb filea fileb filea fileb filea fileb filea fileb filea fileb filea fileb filea fileb “null.bytes”“randomzipfile.zip”＆gt; temp.zip

它生成一个5,996 KB的文件，以7-Zip打开。

所以我知道我的任意二进制数据没有神奇的Zip文件头来搞砸它。我有working jpg+data+zip和non-working jpg+data+zip的参考文件（保存 - 因为浏览器认为它们是图像，并自行添加zip扩展名。）

我想知道为什么它失败了13种组合而不是12种。对于奖励积分，我需要以某种方式解决这个问题。

Answer 1

我下载了7-Zip的源代码，并找出导致这种情况发生的原因。

在CPP / 7zip / UI / Common / OpenArchive.cpp中，您将看到以下内容：

// Static-SFX (for Linux) can be big.
const UInt64 kMaxCheckStartPosition = 1 << 22;

这意味着只搜索文件的前4194304个字节的标题。如果在那里找不到，7-Zip认为它是无效文件。

您可以将1 << 22更改为1 << 23，将此限制加倍。我通过重建7-Zip来测试这种变化，但它确实有效。

编辑：要解决此问题，您可以download the source进行上述更改并构建它。我使用VS 2008构建它。打开VS命令提示符，导航到 extracted-source-location \ CPP \ 7zip \ Bundles并输入'nmake'。然后在Alone目录中运行'7za t nonworking.jpg'，你会看到'一切都好'。

Answer 2

实际上这真是两部分答案:)

首先，无论人们怎么说zip文件都不能技术上在文件的末尾逐字显示。中央目录记录的末尾有一个值，表示从当前磁盘的起始位置开始的字节偏移量（如果只有一个.zip文件，则表示当前文件）。现在很多处理器忽略了这一点，虽然Windows的zip文件夹没有，所以你需要更正该值以使其在Windows资源管理器中工作（不是你可能会关心; P）有关文件格式的信息，请参阅Zip APPNOTE 。基本上，您可以在十六进制编辑器（或编写工具）中找到“相对于起始磁盘编号的中心目录的起始偏移”值。然后找到第一个“中央文件头签名”（504b0102的十六进制）并将值设置为该偏移量。

现在唉不修复7zip，但这是由于7zip尝试猜测文件格式的方式。基本上它只搜索第一个~4MiB的二进制序列504b0304，如果它没有找到它，则假定它不是Zip并尝试其他存档格式。这显然是为什么添加一个文件会破坏事物的原因，它会超过搜索限制。

现在要修复它你需要做的是将这个十六进制字符串添加到jpeg而不会破坏它。一种方法是在FFD8 JPEG SOI标头之后添加以下十六进制数据FFEF0005504B030400。这会在您的序列中添加一个自定义块并且是正确的，因此jpeg标题应该忽略它。

Answer 3

所以对于其他人发现这个问题，这就是故事：

是的，Andy对于为什么7-Zip在文件上失败是完全正确的，但它对我的问题没有帮助，因为我无法让人们使用我的7-Zip版本。

然而，tyranid让我得到了解决方案。

首先，他建议将一个小字节串添加到JPG，让7-Zip打开它。但是，它与有效的JPG片段略有不同，需要为FFEF00 07 504B030400 - 长度偏离2个字节。
这让7-Zip打开它，但不提取文件，它无声地失败。这是因为中央目录中的条目具有指向文件条目的内部指针/偏移量。既然你在此之前放了很多东西，你需要纠正所有这些指针！
要使用Windows内置的zip支持打开zip，您需要像tyranid所说的那样纠正“相对于起始磁盘编号的中心目录启动偏移”。这是一个python脚本来做最后两个，虽然它是一个片段，而不是copypasta-ready-to-use


#Now we need to read the file and rewrite all the zip headers.  Fun!
torewrite = open(magicfilename, 'rb')
magicdata = torewrite.read()
torewrite.close()

#Change the Central Repository's Offset
offsetOfCentralRepro = magicdata.find('\x50\x4B\x01\x02') #this is the beginning of the central repo
start = len(magicdata) - 6 #it so happens, that on my files, the point is stored 2 bytes from the end.  so datadatadatdaata OF FS ET !! 00 00 EOF where OFFSET!! is the 4 bytes 00 00 are the last two bytes, then EOF
magicdata = magicdata[:start] + pack('I', offsetOfCentralRepro) + magicdata[start+4:]

#Now change the individual offsets in the central directory files
startOfCentralDirectoryEntry = magicdata.find('\x50\x4B\x01\x02', 0) #find the first central directory entry
startOfFileDirectoryEntry = magicdata.find('\x50\x4B\x03\x04', 10) #find the first file entry (we start at 10 because we have to skip past the first fake entry in the jpg)
while startOfCentralDirectoryEntry > 0:
    #Now I move a magic number of bytes past the entry (really! It's 42!)
    startOfCentralDirectoryEntry = startOfCentralDirectoryEntry + 42

    #get the current offset just to output something to the terminal
    (oldoffset,) = unpack('I', magicdata[startOfCentralDirectoryEntry : startOfCentralDirectoryEntry+4])
    print "Old Offset: ", oldoffset, " New Offset: ", startOfFileDirectoryEntry , " at ", startOfCentralDirectoryEntry
    #now replace it
    magicdata = magicdata[:startOfCentralDirectoryEntry] + pack('I', startOfFileDirectoryEntry) + magicdata[startOfCentralDirectoryEntry+4:]

    #now I move to the next central directory entry, and the next file entry
    startOfCentralDirectoryEntry = magicdata.find('\x50\x4B\x01\x02', startOfCentralDirectoryEntry)
    startOfFileDirectoryEntry = magicdata.find('\x50\x4B\x03\x04', startOfFileDirectoryEntry+1)

#Finally write the rewritten headers' data
towrite = open(magicfilename, 'wb')
towrite.write(magicdata)
towrite.close()

Answer 4

您可以使用DotNetZip生成混合JPG + ZIP文件。 DotNetZip可以保存到流中，并且它足够智能，可以在开始将zip内容写入其中之前识别预先存在的流的原始偏移。因此，在伪代码中，您可以通过这种方式获得JPG + ZIP：

 open stream on an existing JPG file for update
 seek to the end of that stream
 open or create a zip file
 call ZipFile.Save to write zip content to the JPG stream
 close

正确计算所有偏移量。相同的技术用于生成自解压存档。您可以在EXE上打开流，然后搜索到最后，并将ZIP内容写入该流。如果以这种方式执行，则可以正确计算所有偏移量。

另一件事 - 关于另一篇文章中的一条评论...... ZIP可以在文件末尾的和开头有任意数据。据我所知，zip中心目录需要位于文件的末尾，尽管这是典型的。

Zip格式的JPG + Zip文件组合问题

4 个答案: