对Zlib Java和C进行基准测试

时间:2014-07-24 21:19:53

标签: java c zlib deflate

我正在尝试通过切换到C来加速我最初用Java编写的TIFF编码器,并编译Zlib 1.2.8并定义Z_SOLO个最小C文件集:{{1} },adler32.ccrc32.cdeflate.ctrees.c。 Java正在使用java.util.zip.Deflater

我编写了一个简单的测试程序,用于评估压缩级别和速度方面的性能,并且考虑到更高级别所需的时间越来越多,因此无论我需要什么级别,压缩都没有那么多,这一事实让我感到困惑。令我惊讶的是Java在压缩速度方面实际上比Visual Studio Release-compile(VC2010)表现更好:

爪哇:

zutil.c

C:

Level 1 : 8424865 => 6215200 (73,8%) in 247 cycles.
Level 2 : 8424865 => 6178098 (73,3%) in 254 cycles.
Level 3 : 8424865 => 6181716 (73,4%) in 269 cycles.
Level 4 : 8424865 => 6337236 (75,2%) in 334 cycles.
Level 5 : 8424865 => 6331902 (75,2%) in 376 cycles.
Level 6 : 8424865 => 6333914 (75,2%) in 395 cycles.
Level 7 : 8424865 => 6333350 (75,2%) in 400 cycles.
Level 8 : 8424865 => 6331986 (75,2%) in 437 cycles.
Level 9 : 8424865 => 6331598 (75,2%) in 533 cycles.

我是唯一一个目睹这种结果的人吗?我的猜测是JVM中的Zlib正在使用我不包含在我的C项目中的汇编类型优化,或者在编译Zlib(或Visual Studio编译器很糟糕)时我缺少明显的配置步骤。

以下是两个片段:

爪哇:

Level 1 : 8424865 => 6215586 (73.8%) in 298 cycles.
Level 2 : 8424865 => 6195280 (73.5%) in 309 cycles.
Level 3 : 8424865 => 6182748 (73.4%) in 331 cycles.
Level 4 : 8424865 => 6337942 (75.2%) in 406 cycles.
Level 5 : 8424865 => 6339203 (75.2%) in 457 cycles.
Level 6 : 8424865 => 6337100 (75.2%) in 481 cycles.
Level 7 : 8424865 => 6336396 (75.2%) in 492 cycles.
Level 8 : 8424865 => 6334293 (75.2%) in 547 cycles.
Level 9 : 8424865 => 6333084 (75.2%) in 688 cycles.

C:

public static void main(String[] args) throws IOException {
    byte[] pix = Files.readAllBytes(Paths.get("MY_MOSTLY_UNCOMPRESSED.TIFF"));
    int szin = pix.length;
    byte[] buf = new byte[szin*101/100];
    int szout;
    long t0, t1;

    for (int i = 1; i <= 9; i++) {
        t0 = System.currentTimeMillis();
        Deflater deflater = new Deflater(i);
        deflater.setInput(pix);
        szout = deflater.deflate(buf);
        deflater.finish();
        t1 = System.currentTimeMillis();
        System.out.println(String.format("Level %d : %d => %d (%.1f%%) in %d cycles.", i, szin, szout, 100.0f*szout/szin, t1 - t0));
    }
}

编辑:

在@ MarkAdler的评论之后,我通过#include <time.h> #define SZIN 9000000 #define SZOUT 10000000 void main(void) { static unsigned char buf[SZIN]; static unsigned char out[SZOUT]; clock_t t0, t1; int i, ret; uLongf sz, szin; FILE* f = fopen("MY_MOSTLY_UNCOMPRESSED.TIFF", "rb"); szin = fread(buf, 1, SZIN, f); fclose(f); for (i = 1; i <= 9; i++) { sz = SZOUT; t0 = clock(); compress2(out, &sz, buf, szin, i); // I rewrote compress2, as it's not available when Z_SOLO is defined t1 = clock(); printf("Level %d : %d => %d (%.1f%%) in %ld cycles.\n", i, szin, sz, 100.0f*sz/szin, t1 - t0); } } (即deflateInit2()Z_FILTERED)尝试了不同的压缩策略:

Z_HUFFMAN_ONLY

Z_FILTERED

Level 1 : 8424865 => 6215586 (73.8%) in 299 cycles. Level 2 : 8424865 => 6195280 (73.5%) in 310 cycles. Level 3 : 8424865 => 6182748 (73.4%) in 330 cycles. Level 4 : 8424865 => 6623409 (78.6%) in 471 cycles. Level 5 : 8424865 => 6604616 (78.4%) in 501 cycles. Level 6 : 8424865 => 6595698 (78.3%) in 528 cycles. Level 7 : 8424865 => 6594845 (78.3%) in 536 cycles. Level 8 : 8424865 => 6592863 (78.3%) in 595 cycles. Level 9 : 8424865 => 6591118 (78.2%) in 741 cycles.

Z_HUFFMAN_ONLY

根据他的评论预期,Level 1 : 8424865 => 6803043 (80.7%) in 111 cycles. Level 2 : 8424865 => 6803043 (80.7%) in 108 cycles. Level 3 : 8424865 => 6803043 (80.7%) in 106 cycles. Level 4 : 8424865 => 6803043 (80.7%) in 106 cycles. Level 5 : 8424865 => 6803043 (80.7%) in 107 cycles. Level 6 : 8424865 => 6803043 (80.7%) in 106 cycles. Level 7 : 8424865 => 6803043 (80.7%) in 107 cycles. Level 8 : 8424865 => 6803043 (80.7%) in 108 cycles. Level 9 : 8424865 => 6803043 (80.7%) in 107 cycles. 不会更改压缩,但会更快地执行很多。根据我的数据,Z_HUFFMAN_ONLY并不比Z_FILTERED更快,压缩程度更差。

1 个答案:

答案 0 :(得分:5)

压缩量和增量对于未压缩的图像数据并不奇怪,基本上没有匹配的字符串。被压缩的数据部分没有被进一步压缩 - 它被略微扩展了一些不变量,所以变化都在未压缩部分。

3级和4级之间的算法发生了变化,其中3级用于找到的第一个匹配。当几乎没有匹配的字符串时,这将倾向于最小化发送字符串匹配的开销,因此压缩效果更好。如果使用FILTEREDHUFFMAN_ONLY完全关闭字符串匹配,您可能会做得更好。 HUFFMAN_ONLY还具有甚至不寻找匹配字符串的优势,显着加快了压缩速度。

至于速度差异,我只能猜测使用了不同的编译器或不同的编译器优化。