Question

我正在对HTML stripper进行一些性能测试（用java编写），也就是说，我将一个字符串（实际上是html内容）传递给HTML stripper的方法后者返回纯文本（没有HTML标签和元信息）。

以下是具体实现的示例

public void performanceTest() throws IOException {
    long totalTime;
    File file = new File("/directory/to/ten/different/htmlFiles");
    for (int i = 0; i < 200; ++i) {
        for (File fileEntry : file.listFiles()) {

            HtmlStripper stripper = new HtmlStripper();
            URL url = fileEntry.toURI().toURL();
            InputStream inputStream = url.openStream();
            String html = IOUtils.toString(inputStream, "UTF-8");
            long start = System.currentTimeMillis();
            String text = stripper.getText(html);
            long end = System.currentTimeMillis();
            totalTime = totalTime + (end - start);

      //The duration for the stripping of each file is computed here
     // (200 times for each time). That duration value decreases and then becomes constant
     //IMHO if the duration for the same file should always remain the same.
     //Or is a cache technique used by the JVM?         


        System.out.println("time needed for stripping current file: "+ (end -start));
        }
    }
    System.out.println("Average time for one document: "
            + (totalTime / 2000));

}

但每次扫描每个文件的持续时间每次计算200次，并且具有不同的递减值。恕我直言，如果一个和同一个文件X的持续时间应始终保持不变！？或者是JVM使用的缓存技术？

任何帮助将不胜感激。提前致谢

贺

N.B： - 我在我的机器上进行本地测试（没有远程，没有http）。 - 我在Ubuntu 10.04上使用java 6

Answer 1

这是完全正常的。 JIT将方法编译为本机代码，并且随着它们被越来越多地使用而对它们进行更大程度的优化。（您的基准最终收敛的“常数”是JIT优化能力的最高点。）

Answer 2

恕我直言，如果同一个文件X的持续时间应始终保持不变

不存在优化的即时编译器。除此之外，它还会跟踪特定方法/分支的使用次数，并有选择地将Java字节代码编译为本机代码。

jvm缓存方法调用？

2 个答案: