Question

我正在尝试将除8位以外的非隔行扫描GIF图像添加到PDF文档，而必须使用PDF::Create为Perl完全解码比特流。

属于PDF standard的LZWDecode算法要求所有图片的最小LZW代码大小为8位，PDF::Create硬编码为仅嵌入8-位图。

到目前为止，我已经从PDF::Create调整image loader来读取5位图像并完全解码LZW流。然后，我可以使用PDF::Create中的编码器算法将图像重新打包为8位。

我想做的是消除内存密集型解码/编码步骤。 This thread表示可以通过“扩大或移位”来使LZW代码适合LZWDecode的长度。

我联系了线程作者并提供了一些额外的细节，特别是颜色索引的代码保持不变但用零填充（例如，[10000]变为[000010000]），{{1 }和<Clear>代码分别更改为<End>和<256>，并且所有其他代码都被256 - 原始<257>代码偏移。

但是，由于雇主的限制，他无法进一步详细说明。特别是，当修改后的值超过<Clear>（LZW代码表的最大索引）时，我不确定如何处理代码。我也不确定如何将修改后的代码重新打包成比特流。

我目前使用的算法如下。

<4095>

Answer 1

一次做一个很慢。一次完成这些操作会给你带来太多的记忆。一次做一大块。

my $BUFFER_SIZE = 5 * 50_000;  # Must be a multiple of 5.

my $in_bytes = ...;
my $out_bytes = '';
while (my ($bytes) = $in_bytes =~ s/^(.{1,$BUFFER_SIZE})//s) {
   # Unpack from 5 bit fields.
   my @vals = map { pack('B*', "000$_") } unpack('B*', $bytes) =~ /(.{5})/g;

   # Transform @vals into 8 bit values here.

   # Pack to 8 bit fields.
   $out_bytes .= pack('C*', @vals);

}

由于您根本没有转换值（只是它们的存储方式），因此简化为：

my $BUFFER_SIZE = 5 * 50_000;  # Must be a multiple of 40.

my $in_bytes = ...;
my $out_bytes = '';
while (my ($bytes) = $in_bytes =~ s/^(.{1,$BUFFER_SIZE})//s) {
   # Unpack from 5 bit fields.
   my $bits = unpack('B*', $bytes);
   $bits =~ s/(.{5})/000$1/g;
   $out_bytes .= pack('B*', $bits);

}

您没有说明如何处理额外的比特。我只是忽略了它们。

没有位字符串创建的替代方法：

my $in_bytes = ...;
my $out_bytes = '';
while (my ($bytes) = $in_bytes =~ s/^(.{1,5})//s) {
    my @bytes = map ord, split //, $bytes;

    # 00000111 11222223 33334444 45555566 66677777

    $out_bytes .= chr(                            (($bytes[0] >> 3) & 0x1F));
    last if @bytes == 1;
    $out_bytes .= chr((($bytes[0] << 2) & 0x1C) | (($bytes[1] >> 6) & 0x03));
    $out_bytes .= chr(                            (($bytes[1] >> 1) & 0x1F));
    last if @bytes == 2;
    $out_bytes .= chr((($bytes[1] << 4) & 0x10) | (($bytes[2] >> 4) & 0x0F));
    last if @bytes == 3;
    $out_bytes .= chr((($bytes[2] << 1) & 0x1E) | (($bytes[3] >> 7) & 0x01));
    $out_bytes .= chr(                            (($bytes[3] >> 2) & 0x1F));
    last if @bytes == 4;
    $out_bytes .= chr((($bytes[3] << 3) & 0x18) | (($bytes[4] >> 5) & 0x07));
    $out_bytes .= chr(                            ( $bytes[4]       & 0x1F));
}

上述解决方案的优点在于它在C中特别有效。

STRLEN in_len;
const char* in = SvPVbyte(sv, in_len);

STRLEN out_len = (in_len * 8 / 5) * 8;
char* out = (char*)malloc(out_len);

char* out_cur = out;
char* in_end = in + in_len;

while (in != in_end) {
    *(out_cur++) =                          ((*in >> 3) & 0x1F));
    if (++in == in_end) break;
    *(out_cur++) = ((in[-1] << 2) & 0x1C) | ((*in >> 6) & 0x03));
    *(out_cur++) =                          ((*in >> 1) & 0x1F));
    if (++in == in_end) break;
    *(out_cur++) = ((in[-1] << 4) & 0x10) | ((*in >> 4) & 0x0F));
    if (++in == in_end) break;
    *(out_cur++) = ((in[-1] << 1) & 0x1E) | ((*in >> 7) & 0x01));
    *(out_cur++) =                          ((*in >> 2) & 0x1F));
    if (++in == in_end) break;
    *(out_cur++) = ((in[-1] << 3) & 0x18) | ((*in >> 5) & 0x07));
    *(out_cur++) =                          ( *in       & 0x1F));
}

return newSVpvn(out, out_len);

使用Perl将8位以外的GIF图像添加到PDF

1 个答案: