Question

假设我创建了一个用于模拟处理器内存的数组：

byte[] mem = new byte[0xF00];

此数组在仿真操作过程中使用，最终（读取：带频率）需要被丢弃或重置。我的问题是，哪个更快，为什么？

mem = new byte[0xF00];

或：

for(int i = 0; i < mem.length; i++) mem[i] = 0;

这可能看起来并不重要，但在模拟大量处理器时，效率会有所不同。速度的差异将来自JVM的垃圾收集;在一个中，必须转储数组并进行垃圾收集，但是，JVM不再需要分配（并且可能为零）新内存。第二，避免了JVM成本，但我们仍然必须遍历数组中的每个元素。

作为对这个问题的补充警告：

成本比率是否随数据类型的大小而变化？例如，short[]？
阵列的长度是否会影响成本比率？
最重要的是，为什么？

Answer 1

您可以自己测试，但删除和重新创建阵列大致相同。

然而，它有两个缺点

它会导致CPU数据缓存滚动，从而降低其有效性。
它更有可能触发GC，特别是如果你经常这样做，会暂停系统，或者减慢它（如果它是并发的）

我更喜欢重用数组，不是因为它是最快的，但它对你应用程序的其余部分影响最小。

for (int size = 16; size <= 16* 1024; size *= 2) {
    int count1 = 0, count1b = 0,count2 = 0;
    long total1 = 0, total1b = 0, total2 = 0;
    for (long i = 0; i < 10000000000L; i += size) {
        long start = System.nanoTime();
        long[] longs = new long[size];
        if (longs[0] + longs[longs.length - 1] != 0)
            throw new AssertionError();
        long mid = System.nanoTime();
        long time1 = mid - start;
        Arrays.fill(longs, 1L);
        long time2 = System.nanoTime() - mid;
        count1b++;
        total1b += time1;
        if (time1 < 10e3) {// no GC
            total1 += time1;
            count1++;
        }
        if (time2 < 10e3) {// no GC
            total2 += time2;
            count2++;
        }
    }
    System.out.printf("%s KB took on average of %,d ns to allocate, %,d ns to allocate including GCs and %,d ns to fill%n",
            size * 8 / 1024.0, total1 / count1, total1b/count1b, total2 / count2);
}

打印

0.125 KB took on average of 35 ns to allocate, 36 ns to allocate including GCs and 19 ns to fill
0.25 KB took on average of 39 ns to allocate, 40 ns to allocate including GCs and 31 ns to fill
0.5 KB took on average of 56 ns to allocate, 58 ns to allocate including GCs and 55 ns to fill
1.0 KB took on average of 75 ns to allocate, 77 ns to allocate including GCs and 117 ns to fill
2.0 KB took on average of 129 ns to allocate, 134 ns to allocate including GCs and 232 ns to fill
4.0 KB took on average of 242 ns to allocate, 248 ns to allocate including GCs and 368 ns to fill
8.0 KB took on average of 479 ns to allocate, 496 ns to allocate including GCs and 644 ns to fill
16.0 KB took on average of 1,018 ns to allocate, 1,055 ns to allocate including GCs and 1,189 ns to fill
32.0 KB took on average of 2,119 ns to allocate, 2,200 ns to allocate including GCs and 2,625 ns to fill
64.0 KB took on average of 4,419 ns to allocate, 4,604 ns to allocate including GCs and 4,728 ns to fill
128.0 KB took on average of 8,333 ns to allocate, 9,472 ns to allocate including GCs and 8,685 ns to fill

仅证明在所有情况下都难以假设一种方法比另一种更快。

如果我将long[]更改为int[]，我会看到相同的

0.125 KB took on average of 35 ns to allocate, 36 ns to allocate including GCs and 16 ns to fill
0.25 KB took on average of 40 ns to allocate, 41 ns to allocate including GCs and 24 ns to fill
0.5 KB took on average of 58 ns to allocate, 60 ns to allocate including GCs and 40 ns to fill
1.0 KB took on average of 86 ns to allocate, 87 ns to allocate including GCs and 94 ns to fill
2.0 KB took on average of 139 ns to allocate, 143 ns to allocate including GCs and 149 ns to fill
4.0 KB took on average of 256 ns to allocate, 262 ns to allocate including GCs and 206 ns to fill
8.0 KB took on average of 472 ns to allocate, 481 ns to allocate including GCs and 317 ns to fill
16.0 KB took on average of 981 ns to allocate, 999 ns to allocate including GCs and 516 ns to fill
32.0 KB took on average of 2,098 ns to allocate, 2,146 ns to allocate including GCs and 1,458 ns to fill
64.0 KB took on average of 4,312 ns to allocate, 4,445 ns to allocate including GCs and 4,028 ns to fill
128.0 KB took on average of 8,497 ns to allocate, 9,072 ns to allocate including GCs and 7,141 ns to fill

Answer 2

重新分配数组实际上不会增加每个GC的成本，因为GC只访问和复制活动对象，并且不会对死对象执行任何操作。但是，分配对象会导致较小的GC更频繁地发生。但是，如果最近分配的对象都没有存活，那么次要GC的成本非常低，并且根本不会产生主要的GC。

此外，在当前的Java版本中，对象分配很便宜，并且可以很容易地将分配空间的归零假设为JVM可能实现的最有效的归零。如果你设法使代码中的数组与JVM一样快（编辑：正如Steven Schlansker所提到的，JIT编译器可以优化内存填充循环），重用数组应该更快。无论如何，直到你说明的for循环被JIT编译器优化，我认为它要慢得多。

回答您的其他问题：

GC会立即将分配空间（Eden）归零，因此无论是short[]还是byte[]都没有区别。但是，当使用short[]代替byte[]时，你的for循环只需要一半的迭代次数就可以将相同的字节数归零（设置一个字节或短的0不应该使任何差异）
数组越长，for-loop所需的迭代次数就越多。所以这种增长是线性的。 GC还需要使用分摊的线性时间将所涉及的字节范围归零，因此我认为两种方法之间的比率保持不变。然而，可能存在更有效的方法来将大存储区域归零而不是小存储区域，这将使得由GC（整个分配空间立即）完成的归零比循环方法更有效。对于非常大的阵列，情况可能会发生变化：这些情况会直接分配到Tenured阶段（除非使用G1），因此会导致更昂贵的主要GC。

Answer 3

我同意观察到resuing数组对应用程序的影响最小，但是你的特定情况似乎并没有对GC造成太大影响：

for(int i = 0; i < mem.length; i++) mem[i] = 0;

在上面的循环中（mem.length是61440），会有2*61400个分配和61400个比较。

现在，在特定对象的扫描或内存解除分配阶段的GC的情况下，整个内存块将被解除，哪个IMO应该比来自上面循环的统计更快。

但是，当代码/应用程序行为导致GC循环过多时，GC实际应用程序性能成本就会降低（如果频繁的主循环，则会出现最差）。您的具体案例并未揭示更高级的显式GC。

我认为byte[]中的循环方法会更好。如果它Object[]比我们可能有不同的方法。

Answer 4

我肯定会选择mem = new byte[0xF00];并让GC完成剩下的工作。

内存使用量可能稍微大一点，但除非你每秒数千次，否则不会对你的应用程序产生影响。

执行时间会快得多，无需手动调用GC，无论如何都会完成他的工作。

Answer 5

这里有4个重要因素。

1）目标平台是什么？（它有很多RAM吗？多CPU内核？） 2）您计划分配的最大内存量是多少？（更大的金额可能有利于分配/解除分配） 3）您打算使用哪种JVM？ 4）如果您的应用程序性能至关重要，为什么要用Java开发它？

更重要的是，我会说，“不要担心过早优化”。首先编写软件，然后对其进行分析，然后优化执行缓慢的部分。作为一般规则，算法性能通常是比数据结构性能更大的问题，特别是当您的数据结构基本上只是一个空白寻址空间时。

删除并重新创建数组或用零填充它是否更快，为什么？

5 个答案: