Question

从非直接字节缓冲区中获取/放置比直接bytebuffer中的get / put更快吗？

如果我必须从直接bytebuffer读取/写入，首先读取/写入线程本地字节数组然后使用字节数组更新（写入）直接字节缓冲区更好吗？

Answer 1

从非直接字节缓冲区中获取/放置比直接bytebuffer中的get / put更快吗？

如果要将堆缓冲区与不使用本机字节顺序的直接缓冲区进行比较（大多数系统都是小端，而直接ByteBuffer的默认值是big endian），则性能非常相似。

如果使用本机有序字节缓冲区，则对于多字节值，性能可能会明显提高。对于byte而言，无论你做什么都没有什么区别。

在HotSpot / OpenJDK中，ByteBuffer使用Unsafe类，许多native方法被视为intrinsics。这是依赖于JVM的，而AFAIK是Android VM在最近版本中将其视为内在的。

如果转储生成的程序集，您可以在一个机器代码指令中看到Unsafe中的内在函数。即他们没有JNI呼叫的开销。

事实上，如果您进行微调，您可能会发现ByteBuffer getXxxx或setXxxx的大部分时间都花在边界检查上，而不是实际的内存访问。出于这个原因，当我必须以最大限度地提高性能时，我仍然直接使用（注意：Oracle不鼓励这样做）

如果我必须从直接bytebuffer读取/写入，首先读取/写入线程本地字节数组然后使用字节数组更新（写入）直接字节缓冲区更好吗？

我不愿意看到什么比这更好。 ;）听起来很复杂。

通常最简单的解决方案更好更快。

您可以使用此代码自行测试。

public static void main(String... args) { ByteBuffer bb1 = ByteBuffer.allocateDirect(256 * 1024).order(ByteOrder.nativeOrder()); ByteBuffer bb2 = ByteBuffer.allocateDirect(256 * 1024).order(ByteOrder.nativeOrder()); for (int i = 0; i < 10; i++) runTest(bb1, bb2); } private static void runTest(ByteBuffer bb1, ByteBuffer bb2) { bb1.clear(); bb2.clear(); long start = System.nanoTime(); int count = 0; while (bb2.remaining() > 0) bb2.putInt(bb1.getInt()); long time = System.nanoTime() - start; int operations = bb1.capacity() / 4 * 2; System.out.printf("Each putInt/getInt took an average of %.1f ns%n", (double) time / operations); }

打印

Each putInt/getInt took an average of 83.9 ns Each putInt/getInt took an average of 1.4 ns Each putInt/getInt took an average of 34.7 ns Each putInt/getInt took an average of 1.3 ns Each putInt/getInt took an average of 1.2 ns Each putInt/getInt took an average of 1.3 ns Each putInt/getInt took an average of 1.2 ns Each putInt/getInt took an average of 1.2 ns Each putInt/getInt took an average of 1.2 ns Each putInt/getInt took an average of 1.2 ns

我非常确定JNI调用的时间超过1.2 ns。

要证明它不是“JNI”，而是围绕它引起延迟的guff。您可以直接使用Unsafe编写相同的循环。

public static void main(String... args) { ByteBuffer bb1 = ByteBuffer.allocateDirect(256 * 1024).order(ByteOrder.nativeOrder()); ByteBuffer bb2 = ByteBuffer.allocateDirect(256 * 1024).order(ByteOrder.nativeOrder()); for (int i = 0; i < 10; i++) runTest(bb1, bb2); } private static void runTest(ByteBuffer bb1, ByteBuffer bb2) { Unsafe unsafe = getTheUnsafe(); long start = System.nanoTime(); long addr1 = ((DirectBuffer) bb1).address(); long addr2 = ((DirectBuffer) bb2).address(); for (int i = 0, len = Math.min(bb1.capacity(), bb2.capacity()); i < len; i += 4) unsafe.putInt(addr1 + i, unsafe.getInt(addr2 + i)); long time = System.nanoTime() - start; int operations = bb1.capacity() / 4 * 2; System.out.printf("Each putInt/getInt took an average of %.1f ns%n", (double) time / operations); } public static Unsafe getTheUnsafe() { try { Field theUnsafe = Unsafe.class.getDeclaredField("theUnsafe"); theUnsafe.setAccessible(true); return (Unsafe) theUnsafe.get(null); } catch (Exception e) { throw new AssertionError(e); } }

打印

Each putInt/getInt took an average of 40.4 ns Each putInt/getInt took an average of 44.4 ns Each putInt/getInt took an average of 0.4 ns Each putInt/getInt took an average of 0.3 ns Each putInt/getInt took an average of 0.3 ns Each putInt/getInt took an average of 0.3 ns Each putInt/getInt took an average of 0.3 ns Each putInt/getInt took an average of 0.3 ns Each putInt/getInt took an average of 0.3 ns Each putInt/getInt took an average of 0.3 ns

因此，您可以看到native调用比JNI调用所期望的要快得多。这种延迟的主要原因可能是L2缓存速度。 ;）

全部在i3 3.3 GHz上运行

Answer 2

直接缓冲区将数据保存在JNI域中，因此get（）和put（）必须跨越JNI边界。非直接缓冲区将数据保存在JVM域中。

所以：

如果您没有在Java土地上玩数据，例如只是将一个通道复制到另一个通道，直接缓冲区更快，因为数据根本不必越过JNI边界。
相反，如果您正在使用Java land中的数据，非直接缓冲区将更快。它的重要性取决于数据跨越JNI边界的数量以及每次传输的量子数量。例如，从/向直接缓冲区一次获取或放入一个字节可能会非常昂贵，一次获取/放置16384个字节会大大减少JNI边界成本。

要回答你的第二段，我会使用一个本地byte []数组，而不是一个线程本地的，但是如果我在Java中使用数据，我根本不会使用直接的字节缓冲区。正如Javadoc所说，直接字节缓冲区应仅用于可提供可衡量的性能优势的地方。

比较直接和非直接ByteBuffer get / put操作

2 个答案: