自旋锁退避策略背后的原因

时间:2019-10-29 04:46:10

标签: c++ assembly cpu-architecture lock-free compare-and-swap

我正在看OpenJDK12中JVM HotSpot中的自旋锁实现。这是它的实现方式(保留注释):

// Polite TATAS spinlock with exponential backoff - bounded spin.
// Ideally we'd use processor cycles, time or vtime to control
// the loop, but we currently use iterations.
// All the constants within were derived empirically but work over
// over the spectrum of J2SE reference platforms.
// On Niagara-class systems the back-off is unnecessary but
// is relatively harmless.  (At worst it'll slightly retard
// acquisition times).  The back-off is critical for older SMP systems
// where constant fetching of the LockWord would otherwise impair
// scalability.
//
// Clamp spinning at approximately 1/2 of a context-switch round-trip.
// See synchronizer.cpp for details and rationale.

int Monitor::TrySpin(Thread * const Self) {
  if (TryLock())    return 1;
  if (!os::is_MP()) return 0;

  int Probes  = 0;
  int Delay   = 0;
  int SpinMax = 20;
  for (;;) {
    intptr_t v = _LockWord.FullWord;
    if ((v & _LBIT) == 0) {
      if (Atomic::cmpxchg (v|_LBIT, &_LockWord.FullWord, v) == v) {
        return 1;
      }
      continue;
    }

    SpinPause();

    // Periodically increase Delay -- variable Delay form
    // conceptually: delay *= 1 + 1/Exponent
    ++Probes;
    if (Probes > SpinMax) return 0;

    if ((Probes & 0x7) == 0) {
      Delay = ((Delay << 1)|1) & 0x7FF;
      // CONSIDER: Delay += 1 + (Delay/4); Delay &= 0x7FF ;
    }

    // Stall for "Delay" time units - iterations in the current implementation.
    // Avoid generating coherency traffic while stalled.
    // Possible ways to delay:
    //   PAUSE, SLEEP, MEMBAR #sync, MEMBAR #halt,
    //   wr %g0,%asi, gethrtime, rdstick, rdtick, rdtsc, etc. ...
    // Note that on Niagara-class systems we want to minimize STs in the
    // spin loop.  N1 and brethren write-around the L1$ over the xbar into the L2$.
    // Furthermore, they don't have a W$ like traditional SPARC processors.
    // We currently use a Marsaglia Shift-Xor RNG loop.
    if (Self != NULL) {
      jint rv = Self->rng[0];
      for (int k = Delay; --k >= 0;) {
        rv = MarsagliaXORV(rv);
        if (SafepointMechanism::should_block(Self)) return 0;
      }
      Self->rng[0] = rv;
    } else {
      Stall(Delay);
    }
  }
}

Link to source

Atomic::cmpxchg在x86上的实现方式为

template<>
template<typename T>
inline T Atomic::PlatformCmpxchg<8>::operator()(T exchange_value,
                                                T volatile* dest,
                                                T compare_value,
                                                atomic_memory_order /* order */) const {
  STATIC_ASSERT(8 == sizeof(T));
  __asm__ __volatile__ ("lock cmpxchgq %1,(%3)"
                        : "=a" (exchange_value)
                        : "r" (exchange_value), "a" (compare_value), "r" (dest)
                        : "cc", "memory");
  return exchange_value;
}

Link to source

我不了解的事情是“旧SMP”系统退缩的原因。据说在compnet中

  

对于老旧的SMP系统而言,退回对于持续获取至关重要   否则,LockWord会损害可伸缩性。

我可以想象的原因是,在获取旧LockWord总线锁定后,CAS LOCK#总线锁定始终被声明(而不是高速缓存锁定)。正如英特尔手册第3卷中所述。8.1.4:

  

对于Intel486和Pentium处理器,LOCK信号始终为   LOCK操作期间在总线上断言,即使   被锁定的内存被缓存在处理器中。对于P6和更多   最近的处理器系列,如果在   LOCK操作被缓存在执行LOCK#的处理器中   作为回写存储器操作,完全包含在缓存中   行,处理器可能不会在总线上断言const信号。

是实际原因吗?还是那是什么?

0 个答案:

没有答案