Question

我的目标是使用C / C ++将32位位图（BGRA）缓冲区实时转换为png图像。为了实现它，我使用libpng库转换位图缓冲区，然后写入png文件。然而，在单线程中执行目标臂板（四核处理器）似乎需要很长时间（约5秒）。在分析时，我发现libpng压缩过程（deflate算法）占用了90％以上的时间。所以我试图通过某种方式使用并行化来减少它。这里的最终目标是至少在不到0.5秒的时间内完成。

现在因为png可以有多个IDAT块，所以我想用并行编写多个IDAT的png。使用以下方法编写具有多个IDAT的自定义png文件

   1. Write PNG IHDR chunk
   2. Write IDAT chunks in parallel
      i.   Split input buffer in 4 parts.
      ii.  compress each part in parallel using zlib "compress" function.
      iii. compute CRC of chunk { "IDAT"+zlib compressed data }.
      iv.  create IDAT chunk i.e. { "IDAT"+zlib compressed data+ CRC}.
      v.   Write length of IDAT chunk created.
      vi.  Write complete chunk in sequence.
   3. write IEND chunk

现在问题是此方法创建的png文件无效或已损坏。有人可以指出

我做错了什么？
是否有快速实现zlib压缩或多线程png创建，最好是在C / C ++中？
实现目标的任何其他替代方法？

注意：创建块时会遵循PNG specification

更新：此方法适用于并行创建IDAT

    1. add one filter byte before each row of input image. 
    2. split image in four equal parts. <-- may not be required passing pointer to buffer and their offsets
    3. Compress Image Parts in parallel
            (A)for first image part
                --deflateinit(zstrm,Z_BEST_SPEED)
                --deflate(zstrm, Z_FULL_FLUSH)
                --deflateend(zstrm)
                --store compressed buffer and its length
                --store adler32 for current chunk, {a1=zstrm->adler} <--adler is of uncompressed data
            (B)for second and third image part
                --deflateinit(zstrm,Z_BEST_SPEED)
                --deflate(zstrm, Z_FULL_FLUSH)
                --deflateend(zstrm)
                --store compressed buffer and its length
                --strip first 2-bytes, reduce length by 2
                --store adler32 for current chunk zstrm->adler,{a2,a3 similar to A} <--adler is of uncompressed data
            (C) for last image part
                --deflateinit(zstrm,Z_BEST_SPEED)
                --deflate(zstrm, Z_FINISH)
                --deflateend(zstrm)
                --store compressed buffer and its length
                --strip first 2-bytes and last 4-bytes of buffer, reduce length by 6
                --here last 4 bytes should be equal to ztrm->adler,{a4=zstrm->adler} <--adler is of uncompressed data

    4. adler32_combine() all four parts i.e. a1,a2,a3 & a4 <--last arg is length of uncompressed data used to calculate adler32 of 2nd arg
    5. store total length of compressed buffers <--to be used in calculating CRC of complete IDAT & to be written before IDaT in file
    6. Append "IDAT" to Final chunk
    7. Append all four compressed parts in sequence to Final chunk
    8. Append adler32 checksum computed in step 4 to Final chunk
    9. Append CRC of Final chunk i.e.{"IDAT"+data+adler}

    To be written in png file in this manner: [PNG_HEADER][PNG_DATA][PNG_END]
    where [PNG_DATA] ->Length(4-bytes)+{"IDAT"(4-bytes)+data+adler(4-bytes)}+CRC(4-bytes)

Answer 1

即使PNG数据流中有多个IDAT块，它们仍然包含单个zlib压缩数据流。第一个IDAT的前两个字节是zlib头，最后IDAT的最后四个字节是整个数据流的zlib“adler32”校验和（2字节头除外），在压缩之前计算。

zlib.net/pigz正在开发一个并行的gzip（pigz）。当调用为“pigz -z”时，它将生成zlib数据流而不是gzip数据流。

为此，您不需要拆分输入文件，因为并行压缩发生在pigz内部。

Answer 2

在您的ii步骤中，您需要使用deflate()，而不是compress()。在前三部分使用Z_FULL_FLUSH，在最后一部分使用Z_FINISH。然后，在从最后三个中拉出双字节头（将头保留在第一个头上）之后，将它们连接到单个流，然后从最后一个中拉出四字节检查值。对于所有这些，您可以从strm->adler获取检查值。保存这些。

使用adler32_combine()将您保存的四个检查值合并为一个完整输入的检查值。然后，您可以将其添加到流的末尾。

你有它。

使用libpng快速编码位图缓冲区到png

2 个答案: