为什么JVM没有在Windows x86上发出预取指令

时间:2017-06-04 14:34:55

标签: java assembly x86 jvm jvm-hotspot

正如标题所述,为什么OpenJDK JVM不会在Windows x86上发出预取指令?请参阅OpenJDK Mercurial @ http://hg.openjdk.java.net/jdk8u/jdk8u/hotspot/file/c49dcaf78a65/src/os_cpu/windows_x86/vm/prefetch_windows_x86.inline.hpp

inline void Prefetch::read (void *loc, intx interval) {}
inline void Prefetch::write(void *loc, intx interval) {}

没有评论,除​​了源代码,我没有找到其他资源。我问,因为它对Linux x86这样做,请参阅http://hg.openjdk.java.net/jdk8u/jdk8u/hotspot/file/c49dcaf78a65/src/os_cpu/linux_x86/vm/prefetch_linux_x86.inline.hpp

inline void Prefetch::read (void *loc, intx interval) {
#ifdef AMD64
  __asm__ ("prefetcht0 (%0,%1,1)" : : "r" (loc), "r" (interval));
#endif // AMD64
}

inline void Prefetch::write(void *loc, intx interval) {
#ifdef AMD64

  // Do not use the 3dnow prefetchw instruction.  It isn't supported on em64t.
  //  __asm__ ("prefetchw (%0,%1,1)" : : "r" (loc), "r" (interval));
  __asm__ ("prefetcht0 (%0,%1,1)" : : "r" (loc), "r" (interval));

#endif // AMD64
}

2 个答案:

答案 0 :(得分:8)

JDK-4453409所示,预取是在JDK 1.4中的HotSpot JVM中实现的,以加速GC。那是超过15年前,没有人会记得为什么它没有在Windows上实现。我的猜测是Visual Studio(它一直用于在Windows上构建HotSpot)在这些时候基本上不理解预取指令。看起来像是一个需要改进的地方。

无论如何,您询问的代码是由JVM垃圾收集器内部使用的。这不是JIT生成的。 C2 JIT代码生成器规则位于体系结构定义文件x86_64.ad中,有rulesPrefetchReadPrefetchWritePrefetchAllocation个节点转换为相应的x64指令

一个有趣的事实是PrefetchReadPrefetchWrite节点不会在代码中的任何位置创建。它们仅用于支持Unsafe.prefetchX内在函数,但在JDK 9中它们是removed

JIT生成预取指令的唯一情况是PrefetchAllocation节点。您可以使用-XX:+UnlockDiagnosticVMOptions -XX:+PrintAssembly验证在对象分配后确实生成了PREFETCHNTA,在Linux和Windows 上都

class Test {
    public static void main(String[] args) {
        byte[] b = new byte[0];
        for (;;) {
            b = Arrays.copyOf(b, b.length + 1);
        }
    }
}

java.exe -XX:+UnlockDiagnosticVMOptions -XX:+PrintAssembly Test

# {method} {0x00000000176124e0} 'main' '([Ljava/lang/String;)V' in 'Test'
  ...
  0x000000000340e512: cmp    $0x100000,%r11d
  0x000000000340e519: ja     0x000000000340e60f
  0x000000000340e51f: movslq 0x24(%rsp),%r10
  0x000000000340e524: add    $0x1,%r10
  0x000000000340e528: add    $0x17,%r10
  0x000000000340e52c: mov    %r10,%r8
  0x000000000340e52f: and    $0xfffffffffffffff8,%r8
  0x000000000340e533: cmp    $0x100000,%r11d
  0x000000000340e53a: ja     0x000000000340e496
  0x000000000340e540: mov    0x60(%r15),%rbp
  0x000000000340e544: mov    %rbp,%r9
  0x000000000340e547: add    %r8,%r9
  0x000000000340e54a: cmp    0x70(%r15),%r9
  0x000000000340e54e: jae    0x000000000340e496
  0x000000000340e554: mov    %r9,0x60(%r15)
  0x000000000340e558: prefetchnta 0xc0(%r9)
  0x000000000340e560: movq   $0x1,0x0(%rbp)
  0x000000000340e568: prefetchnta 0x100(%r9)
  0x000000000340e570: movl   $0x200000f5,0x8(%rbp)  ;   {metadata({type array byte})}
  0x000000000340e577: mov    %r11d,0xc(%rbp)
  0x000000000340e57b: prefetchnta 0x140(%r9)
  0x000000000340e583: prefetchnta 0x180(%r9)    ;*newarray
                                                ; - java.util.Arrays::copyOf@1 (line 3236)
                                                ; - Test::main@9 (line 9)

答案 1 :(得分:6)

你引用的文件都有asm代码片段(inline assembler),某些C / C ++软件在自己的代码中使用(apangin, the JVM expert pointed,主要是在GC代码中)。实际上存在差异:x86_64热点的LinuxSolarisBSD变体在热点中有预取,而Windows则禁用/未实现这部分奇怪,部分无法解释为什么,以及它也可能使Windows上的JVM位(某些百分比;在没有硬件预取的平台上更多)更慢,但仍然无助于为Sun / Oracle销售更多solaris / solaris支付合同。 Ross also guessed MS C ++编译器可能不支持内联asm语法,但_mm_prefetch应该(谁将打开JDK bug来添加它to the file?)。

JVM热点是JIT,JIT将JIT作为字节发出(生成)代码(虽然JIT可以将代码从自己的函数复制到生成的代码中或者发出对支持函数的调用,但是会发出预取作为热点中的字节)。我们怎样才能发现它是如何排放的?简单的在线方式是找到一些jdk8u的在线可搜索副本(或cross-reference like metager中的更好),例如在github上:https://github.com/JetBrains/jdk8u_hotspot并搜索prefetchprefetch emitprefetchrlir_prefetchr。有一些相关的结果:

c1 compiler中JVM LIR / jdk8u_hotspot/src/cpu/x86/vm/assembler_x86.cpp中发出的实际字节数:

void Assembler::prefetch_prefix(Address src) {
  prefix(src);
  emit_int8(0x0F);
}

void Assembler::prefetchnta(Address src) {
  NOT_LP64(assert(VM_Version::supports_sse(), "must support"));
  InstructionMark im(this);
  prefetch_prefix(src);
  emit_int8(0x18);
  emit_operand(rax, src); // 0, src
}

void Assembler::prefetchr(Address src) {
  assert(VM_Version::supports_3dnow_prefetch(), "must support");
  InstructionMark im(this);
  prefetch_prefix(src);
  emit_int8(0x0D);
  emit_operand(rax, src); // 0, src
}

void Assembler::prefetcht0(Address src) {
  NOT_LP64(assert(VM_Version::supports_sse(), "must support"));
  InstructionMark im(this);
  prefetch_prefix(src);
  emit_int8(0x18);
  emit_operand(rcx, src); // 1, src
}

void Assembler::prefetcht1(Address src) {
  NOT_LP64(assert(VM_Version::supports_sse(), "must support"));
  InstructionMark im(this);
  prefetch_prefix(src);
  emit_int8(0x18);
  emit_operand(rdx, src); // 2, src
}

void Assembler::prefetcht2(Address src) {
  NOT_LP64(assert(VM_Version::supports_sse(), "must support"));
  InstructionMark im(this);
  prefetch_prefix(src);
  emit_int8(0x18);
  emit_operand(rbx, src); // 3, src
}

void Assembler::prefetchw(Address src) {
  assert(VM_Version::supports_3dnow_prefetch(), "must support");
  InstructionMark im(this);
  prefetch_prefix(src);
  emit_int8(0x0D);
  emit_operand(rcx, src); // 1, src
}

c1 LIR中的用法:src/share/vm/c1/c1_LIRAssembler.cpp

void LIR_Assembler::emit_op1(LIR_Op1* op) {
  switch (op->code()) { 
...
    case lir_prefetchr:
      prefetchr(op->in_opr());
      break;

    case lir_prefetchw:
      prefetchw(op->in_opr());
      break;

现在我们知道the opcode lir_prefetchr and can search for itOpenGrok xreflir_prefetchw,在src/share/vm/c1/c1_LIR.cpp

中找到唯一的例子
void LIR_List::prefetch(LIR_Address* addr, bool is_store) {
  append(new LIR_Op1(
            is_store ? lir_prefetchw : lir_prefetchr,
            LIR_OprFact::address(addr)));
}

还有其他地方可以定义预取指令(对于C2,noted by apangin),the src/cpu/x86/vm/x86_64.ad

// Prefetch instructions. ...
instruct prefetchr( memory mem ) %{
  predicate(ReadPrefetchInstr==3);
  match(PrefetchRead mem);
  ins_cost(125);

  format %{ "PREFETCHR $mem\t# Prefetch into level 1 cache" %}
  ins_encode %{
    __ prefetchr($mem$$Address);
  %}
  ins_pipe(ialu_mem);
%}

instruct prefetchrNTA( memory mem ) %{
  predicate(ReadPrefetchInstr==0);
  match(PrefetchRead mem);
  ins_cost(125);

  format %{ "PREFETCHNTA $mem\t# Prefetch into non-temporal cache for read" %}
  ins_encode %{
    __ prefetchnta($mem$$Address);
  %}
  ins_pipe(ialu_mem);
%}

instruct prefetchrT0( memory mem ) %{
  predicate(ReadPrefetchInstr==1);
  match(PrefetchRead mem);
  ins_cost(125);

  format %{ "PREFETCHT0 $mem\t# prefetch into L1 and L2 caches for read" %}
  ins_encode %{
    __ prefetcht0($mem$$Address);
  %}
  ins_pipe(ialu_mem);
%}

instruct prefetchrT2( memory mem ) %{
  predicate(ReadPrefetchInstr==2);
  match(PrefetchRead mem);
  ins_cost(125);

  format %{ "PREFETCHT2 $mem\t# prefetch into L2 caches for read" %}
  ins_encode %{
    __ prefetcht2($mem$$Address);
  %}
  ins_pipe(ialu_mem);
%}

instruct prefetchwNTA( memory mem ) %{
  match(PrefetchWrite mem);
  ins_cost(125);

  format %{ "PREFETCHNTA $mem\t# Prefetch to non-temporal cache for write" %}
  ins_encode %{
    __ prefetchnta($mem$$Address);
  %}
  ins_pipe(ialu_mem);
%}

// Prefetch instructions for allocation.

instruct prefetchAlloc( memory mem ) %{
  predicate(AllocatePrefetchInstr==3);
  match(PrefetchAllocation mem);
  ins_cost(125);

  format %{ "PREFETCHW $mem\t# Prefetch allocation into level 1 cache and mark modified" %}
  ins_encode %{
    __ prefetchw($mem$$Address);
  %}
  ins_pipe(ialu_mem);
%}

instruct prefetchAllocNTA( memory mem ) %{
  predicate(AllocatePrefetchInstr==0);
  match(PrefetchAllocation mem);
  ins_cost(125);

  format %{ "PREFETCHNTA $mem\t# Prefetch allocation to non-temporal cache for write" %}
  ins_encode %{
    __ prefetchnta($mem$$Address);
  %}
  ins_pipe(ialu_mem);
%}

instruct prefetchAllocT0( memory mem ) %{
  predicate(AllocatePrefetchInstr==1);
  match(PrefetchAllocation mem);
  ins_cost(125);

  format %{ "PREFETCHT0 $mem\t# Prefetch allocation to level 1 and 2 caches for write" %}
  ins_encode %{
    __ prefetcht0($mem$$Address);
  %}
  ins_pipe(ialu_mem);
%}

instruct prefetchAllocT2( memory mem ) %{
  predicate(AllocatePrefetchInstr==2);
  match(PrefetchAllocation mem);
  ins_cost(125);

  format %{ "PREFETCHT2 $mem\t# Prefetch allocation to level 2 cache for write" %}
  ins_encode %{
    __ prefetcht2($mem$$Address);
  %}
  ins_pipe(ialu_mem);
%}