Question

试图了解一些简单测试的结果。首先是带数组的代码：

public class TestFillingArrayOfIntegers {

  public static void main(String[] args) {
    Integer[] intArray = new Integer[20_000_000];

    fill(intArray);
    fill(intArray);
    fill(intArray);
    fill(intArray);
    fill(intArray);
  }

  static void fill(Integer[] in) {
    long startTime = System.nanoTime();
    for (int i = 0; i < 20_000_000; i++) { in[i] = i; }
    System.out.println((System.nanoTime() - startTime) / 1_000_000 + " ms");
  }
}

它会在我的机器上产生这些数字（使用Windows 10的Intel i5桌面上的Oracle Java 8）：

4442 ms
6634 ms
1038 ms
7745 ms
1210 ms

现在使用ArrayList的代码：

public class TestFillingArrayListOfIntegers {

  public static void main(String[] args) {
    java.util.ArrayList<Integer> intList = new java.util.ArrayList<>();

    fill(intList);

    intList.clear();
    fill(intList);

    intList.clear();
    fill(intList);

    intList.clear();
    fill(intList);

    intList.clear();
    fill(intList);
  }

  static void fill(java.util.ArrayList<Integer> in) {
    long startTime = System.nanoTime();
    for (int i = 0; i < 20_000_000; i++) { in.add(i); }
    System.out.println((System.nanoTime() - startTime) / 1_000_000 + " ms");
  }
}

结果

5155 ms
965 ms
7415 ms
93 ms
902 ms

我也尝试了Float，结果相同。从运行到运行，数字几乎相同。

填充基本类型的数组（即int []）几乎是即时的。

我只是无法理解为什么从串行方法调用获得的数字是如此不同（这里我不会问这些数字的绝对值的性能问题）。只有一个猜测：使用包装器填充数组使用我机器的所有CPU核心，因此可能是不稳定多线程的原因。

我知道有缺陷的基准测试，使用JMH更可取。试想我的原始测试是否有效，如果不是，请告诉我。

感谢。

=== [编辑]添加了JMH测试===

# JMH version: 1.19
# VM version: JDK 1.8.0_151, VM 25.151-b12
# VM invoker: C:\Program Files\Java\jre1.8.0_151\bin\java.exe
# VM options: <none>
# Warmup: 5 iterations, 1 s each
# Measurement: 5 iterations, 1 s each
# Timeout: 10 min per iteration
# Threads: 1 thread, will synchronize iterations
# Benchmark mode: Average time, time/op
# Benchmark: org.sample.MyBenchmark.testMethod

首先是数组的代码：

package org.sample;

import org.openjdk.jmh.annotations.*;
import java.util.concurrent.TimeUnit;

@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.MILLISECONDS)
@Warmup(iterations = 5, time = 1, timeUnit = TimeUnit.SECONDS)
@Measurement(iterations = 5, time = 1, timeUnit = TimeUnit.SECONDS)
@Fork(3)
@State(Scope.Thread)
public class MyBenchmark {
  private Integer[] intArray;

  @Setup
  public void setup() {
    intArray = new Integer[20_000_000];
  }

  @Benchmark
  public void testMethod() {
    for (int i = 0; i < 20_000_000; i++) { intArray[i] = i; }
  }

}

结果：

# Run progress: 0,00% complete, ETA 00:00:30
# Fork: 1 of 3
# Warmup Iteration   1: 1607,614 ms/op
# Warmup Iteration   2: 7922,442 ms/op
# Warmup Iteration   3: 7430,643 ms/op
# Warmup Iteration   4: 1067,362 ms/op
# Warmup Iteration   5: 1257,112 ms/op
Iteration   1: 660,460 ms/op
Iteration   2: 1222,175 ms/op
Iteration   3: 664,795 ms/op
Iteration   4: 453,940 ms/op
Iteration   5: 460,370 ms/op

# Run progress: 33,33% complete, ETA 00:00:52
# Fork: 2 of 3
# Warmup Iteration   1: 1621,263 ms/op
# Warmup Iteration   2: 8021,981 ms/op
# Warmup Iteration   3: 7497,249 ms/op
# Warmup Iteration   4: 1052,803 ms/op
# Warmup Iteration   5: 1225,479 ms/op
Iteration   1: 642,912 ms/op
Iteration   2: 629,243 ms/op
Iteration   3: 644,419 ms/op
Iteration   4: 625,221 ms/op
Iteration   5: 449,515 ms/op

# Run progress: 66,67% complete, ETA 00:00:26
# Fork: 3 of 3
# Warmup Iteration   1: 1616,155 ms/op
# Warmup Iteration   2: 7972,240 ms/op
# Warmup Iteration   3: 7462,278 ms/op
# Warmup Iteration   4: 1039,186 ms/op
# Warmup Iteration   5: 1199,929 ms/op
Iteration   1: 635,411 ms/op
Iteration   2: 620,902 ms/op
Iteration   3: 635,565 ms/op
Iteration   4: 618,084 ms/op
Iteration   5: 443,779 ms/op


Result "org.sample.MyBenchmark.testMethod":
  627,119 ?(99.9%) 198,033 ms/op [Average]
  (min, avg, max) = (443,779, 627,119, 1222,175), stdev = 185,240
  CI (99.9%): [429,087, 825,152] (assumes normal distribution)


# Run complete. Total time: 00:01:19

Benchmark               Mode  Cnt    Score     Error  Units
MyBenchmark.testMethod  avgt   15  627,119 ? 198,033  ms/op

似乎结果并不稳定，错误的1/3太多了。首发数字很糟糕。

现在是ArrayList的代码：

package org.sample;

import org.openjdk.jmh.annotations.*;
import java.util.concurrent.TimeUnit;

@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.MILLISECONDS)
@Warmup(iterations = 5, time = 1, timeUnit = TimeUnit.SECONDS)
@Measurement(iterations = 5, time = 1, timeUnit = TimeUnit.SECONDS)
@Fork(3)
@State(Scope.Thread)
public class MyBenchmark {
  private java.util.ArrayList<Integer> intList;

  @Setup
  public void setup() {
    intList = new java.util.ArrayList<>();
  }

  @Benchmark
  public void testMethod() {
    for (int i = 0; i < 20_000_000; i++) { intList.add(i); }
  }

}

结果：

# Run progress: 0,00% complete, ETA 00:00:30
# Fork: 1 of 3
# Warmup Iteration   1: 6058,794 ms/op
# Warmup Iteration   2: 11194,466 ms/op
# Warmup Iteration   3: 1442,472 ms/op
# Warmup Iteration   4: 13665,291 ms/op
# Warmup Iteration   5: 1666,268 ms/op
Iteration   1: 3014,773 ms/op
Iteration   2: 3486,813 ms/op
Iteration   3: 41237,327 ms/op
Iteration   4: 1295,759 ms/op
Iteration   5: 28381,385 ms/op

# Run progress: 33,33% complete, ETA 00:04:17
# Fork: 2 of 3
# Warmup Iteration   1: 5965,381 ms/op
# Warmup Iteration   2: 11372,674 ms/op
# Warmup Iteration   3: 1483,890 ms/op
# Warmup Iteration   4: 13688,102 ms/op
# Warmup Iteration   5: 1699,179 ms/op
Iteration   1: 3055,685 ms/op
Iteration   2: 3433,376 ms/op
Iteration   3: 41953,165 ms/op
Iteration   4: 1316,909 ms/op
Iteration   5: 28855,626 ms/op

# Run progress: 66,67% complete, ETA 00:02:09
# Fork: 3 of 3
# Warmup Iteration   1: 6003,560 ms/op
# Warmup Iteration   2: 11353,880 ms/op
# Warmup Iteration   3: 1443,714 ms/op
# Warmup Iteration   4: 13688,473 ms/op
# Warmup Iteration   5: 2285,464 ms/op
Iteration   1: 3571,613 ms/op
Iteration   2: 4179,211 ms/op
Iteration   3: 41793,050 ms/op
Iteration   4: 1323,737 ms/op
Iteration   5: 28539,350 ms/op


Result "org.sample.MyBenchmark.testMethod":
  15695,852 ?(99.9%) 18165,612 ms/op [Average]
  (min, avg, max) = (1295,759, 15695,852, 41953,165), stdev = 16992,124
  CI (99.9%): [? 0, 33861,464] (assumes normal distribution)


# Run complete. Total time: 00:06:30

Benchmark               Mode  Cnt      Score       Error  Units
MyBenchmark.testMethod  avgt   15  15695,852 ? 18165,612  ms/op

ArrayList结果比没有JMH的结果差。误差偏差大于结果！

有什么想法吗？

以下是一些非默认GC设置的结果。我测试了数组，只是将一个字符串更改为“@Fork（value = 3，jvmArgsAppend = {” - Xms4096m“，” - Xmx4096m“，” - verbose：gc“}）”。结果：

# JMH version: 1.19
# VM version: JDK 1.8.0_151, VM 25.151-b12
# VM invoker: C:\Program Files\Java\jre1.8.0_151\bin\java.exe
# VM options: -Xms4096m -Xmx4096m -verbose:gc
# Warmup: 5 iterations, 1 s each
# Measurement: 5 iterations, 1 s each
# Timeout: 10 min per iteration
# Threads: 1 thread, will synchronize iterations
# Benchmark mode: Average time, time/op
# Benchmark: org.sample.MyBenchmark.testMethod

# Run progress: 0,00% complete, ETA 00:00:30
# Fork: 1 of 3
# Warmup Iteration   1: [GC (Allocation Failure)  1048576K->471308K(4019712K), 0.5042073 secs]
253,738 ms/op
# Warmup Iteration   2: [GC (Allocation Failure)  1519884K->767524K(4019712K), 0.5134485 secs]
[GC (Allocation Failure)  1816100K->1063700K(4019712K), 0.4944016 secs]
239,485 ms/op
# Warmup Iteration   3: [GC (Allocation Failure)  2112276K->1359908K(4019712K), 0.4932206 secs]
[GC (Allocation Failure)  2408484K->1656100K(4019712K), 0.4927346 secs]
212,527 ms/op
# Warmup Iteration   4: [GC (Allocation Failure)  2704676K->1952292K(3437056K), 0.4950558 secs]
[GC (Allocation Failure)  2418212K->2248136K(3728384K), 0.5216161 secs]
274,812 ms/op
# Warmup Iteration   5: [GC (Allocation Failure)  2714056K->2248120K(3728384K), 1.1181796 secs]
1193,570 ms/op
Iteration   1: [GC (Allocation Failure)  2714040K->2248136K(3728384K), 1.1205924 secs]
635,233 ms/op
Iteration   2: [GC (Allocation Failure)  2714056K->2248208K(3728384K), 1.1172824 secs]
1191,058 ms/op
Iteration   3: [GC (Allocation Failure)  2714128K->2248128K(3728384K), 1.1204173 secs]
634,724 ms/op
Iteration   4: [GC (Allocation Failure)  2714048K->2248168K(3728384K), 1.1276622 secs]
1204,162 ms/op
Iteration   5: [GC (Allocation Failure)  2714088K->2248208K(3728384K), 1.1234440 secs]
637,095 ms/op

# Run progress: 33,33% complete, ETA 00:00:26
# Fork: 2 of 3
# Warmup Iteration   1: [GC (Allocation Failure)  1048576K->471323K(4019712K), 0.5115167 secs]
252,752 ms/op
# Warmup Iteration   2: [GC (Allocation Failure)  1519899K->767531K(4019712K), 0.5080484 secs]
[GC (Allocation Failure)  1816107K->1063715K(4019712K), 0.4911580 secs]
238,002 ms/op
# Warmup Iteration   3: [GC (Allocation Failure)  2112291K->1359891K(4019712K), 0.4941844 secs]
[GC (Allocation Failure)  2408467K->1656083K(4019712K), 0.4919721 secs]
211,512 ms/op
# Warmup Iteration   4: [GC (Allocation Failure)  2704659K->1952275K(3437056K), 0.5040378 secs]
[GC (Allocation Failure)  2418195K->2248131K(3728384K), 0.5245958 secs]
276,411 ms/op
# Warmup Iteration   5: [GC (Allocation Failure)  2714051K->2248107K(3728384K), 1.1093389 secs]
1180,963 ms/op
Iteration   1: [GC (Allocation Failure)  2714027K->2248147K(3728384K), 1.1624603 secs]
652,570 ms/op
Iteration   2: [GC (Allocation Failure)  2714067K->2248155K(3728384K), 1.1054001 secs]
1177,282 ms/op
Iteration   3: [GC (Allocation Failure)  2714075K->2248147K(3728384K), 1.1469809 secs]
645,297 ms/op
Iteration   4: [GC (Allocation Failure)  2714067K->2248155K(3728384K), 1.1026592 secs]
1177,270 ms/op
Iteration   5: [GC (Allocation Failure)  2714075K->2248131K(3728384K), 1.1547916 secs]
651,064 ms/op

# Run progress: 66,67% complete, ETA 00:00:13
# Fork: 3 of 3
# Warmup Iteration   1: [GC (Allocation Failure)  1048576K->471299K(4019712K), 0.5202269 secs]
256,682 ms/op
# Warmup Iteration   2: [GC (Allocation Failure)  1519875K->767531K(4019712K), 0.5250656 secs]
[GC (Allocation Failure)  1816107K->1063731K(4019712K), 0.4992968 secs]
242,321 ms/op
# Warmup Iteration   3: [GC (Allocation Failure)  2112307K->1359939K(4019712K), 0.5023238 secs]
[GC (Allocation Failure)  2408515K->1656131K(4019712K), 0.5024830 secs]
215,224 ms/op
# Warmup Iteration   4: [GC (Allocation Failure)  2704707K->1952323K(3437056K), 0.5032048 secs]
[GC (Allocation Failure)  2418243K->2248127K(3728384K), 0.5347739 secs]
281,705 ms/op
# Warmup Iteration   5: [GC (Allocation Failure)  2714047K->2248167K(3728384K), 1.1136005 secs]
1188,756 ms/op
Iteration   1: [GC (Allocation Failure)  2714087K->2248143K(3728384K), 1.1517189 secs]
648,966 ms/op
Iteration   2: [GC (Allocation Failure)  2714063K->2248151K(3728384K), 1.1092591 secs]
1183,677 ms/op
Iteration   3: [GC (Allocation Failure)  2714071K->2248199K(3728384K), 1.1160382 secs]
631,461 ms/op
Iteration   4: [GC (Allocation Failure)  2714119K->2248207K(3728384K), 1.1566076 secs]
1238,393 ms/op
Iteration   5: [GC (Allocation Failure)  2714127K->2248215K(3728384K), 1.1143401 secs]
631,711 ms/op


Result "org.sample.MyBenchmark.testMethod":
  862,664 ?(99.9%) 301,000 ms/op [Average]
  (min, avg, max) = (631,461, 862,664, 1238,393), stdev = 281,556
  CI (99.9%): [561,664, 1163,665] (assumes normal distribution)


# Run complete. Total time: 00:00:40

Benchmark               Mode  Cnt    Score     Error  Units
MyBenchmark.testMethod  avgt   15  862,664 ? 301,000  ms/op

看不到任何帮助。不知道所有这些“GC（分配失败）”。我已经读过它们了但暂停的原因是什么？我给JVM提供了4GB的堆。

Answer 1

有几种可能的不稳定因素＆＃34;在这个例子中：

JIT编译开销
当堆增加到工作集所需的大小时，堆热身，并处理在类加载和JIT编译期间创建的垃圾
您的应用程序的内存流失所需的垃圾收集。

对于包装器列表与基元数组获得不同行为的一个原因是创建每个包装器实例将在堆上分配新对象。在原始情况下，没有要分配的包装器对象。

我知道有缺陷的基准测试，使用JMH更可取。试想我的原始测试是否有效，如果不是，请告诉我。

由于上述原因（第1点和第2点），它们无效。

但请注意，＆＃34;消除＆＃34;测量结果的GC开销（上面的第3点）会使结果失真。在比较数组和列表时，GC开销应被视为总体成本的一部分。

使用JMH。

这似乎是你问题的核心：

＆＃34;我仍然想知道是否可以进行不可预测的GC暂停＆＃34;

GC暂停有效无法预测，如果这就是你的意思。

如果你足够努力并且对当前堆大小，GC参数等有足够的了但是，做出这样的预测先验是不切实际的。变量太多了。

＆＃34; ......是否有可能摆脱他们调整GC＆＃34; 。

通过调整²无法完全消除GC暂停。您可以减少其长度（使用低暂停收集器）但不能消除它们。减少的代价是JVM在垃圾收集和相关开销上花费更多时间整体。

他们是生活中的事实。

＆＃34;似乎无法在没有GC暂停的情况下使用20M阵列。＆＃34;

正确。除非......你可以设计你的应用程序，以便它在启动³之后不会分配任何新对象。

^{1 - 例如，在您的基准测试中，代码非常简单，行为可以重现。}

^{2 - 即使是低停顿收集器，如CMS和G1，也是一个新的空间＆＃34;集合将暂停所有非GC线程。暂停时间相对较短......前提是你没有做出新的＆＃34;空间太大了。但你无法消除它。}

^{3 - 理论上，在正常操作期间不分配任何对象的应用程序应该不生成垃圾，因此在JVM预热后没有GC暂停...但要实现这一点，您需要编写应用程序以避免使用大多数/大多数标准Java SE类。非常困难。}

使用原始包装器填充数组和集合时，性能不稳定的原因是什么？

1 个答案: