Question

我使用我自己的__sync_fetch_and_add模板已经使用gcc的Intel兼容内置版（如atomic）很长一段时间了。 “__sync”功能现在被正式视为“遗产”。

C ++ 11支持std::atomic<>及其后代，因此使用它似乎是合理的，因为它使我的代码符合标准，并且编译器将以独立于平台的方式生成最佳代码，这太好了，不可能顺便说一句，我也只需要用atomic替换std::atomic。 std::atomic（re：内存模型）中有很多我不需要的东西，但是默认参数可以解决这个问题。

现在是坏消息。事实证明，从我所知道的，生成的代码是......彻底的废话，甚至根本不是原子的。即使是增加单个原子变量并输出它的最小示例，也只有不少于5个非内联函数调用___atomic_flag_for_address，___atomic_flag_wait_explicit和__atomic_flag_clear_explicit（完全优化），并且另一方面，生成的可执行文件中没有单个原子指令。

是什么给出的？当然总是存在编译器错误的可能性，但是对于大量的审阅者和用户来说，这种相当激烈的事情通常不会被忽视。这意味着，这可能不是一个错误，而是预期的行为。

这么多函数调用背后的“基本原理”是什么，如何在没有原子性的情况下实现原子性？

简单易懂的例子：

#include <atomic> int main() { std::atomic_int a(5); ++a; __builtin_printf("%d", (int)a); return 0; }

生成以下.s：

movl $5, 28(%esp) #, a._M_i movl %eax, (%esp) # tmp64, call ___atomic_flag_for_address # movl $5, 4(%esp) #, movl %eax, %ebx #, __g movl %eax, (%esp) # __g, call ___atomic_flag_wait_explicit # movl %ebx, (%esp) # __g, addl $1, 28(%esp) #, MEM[(__i_type *)&a] movl $5, 4(%esp) #, call _atomic_flag_clear_explicit # movl %ebx, (%esp) # __g, movl $5, 4(%esp) #, call ___atomic_flag_wait_explicit # movl 28(%esp), %esi # MEM[(const __i_type *)&a], __r movl %ebx, (%esp) # __g, movl $5, 4(%esp) #, call _atomic_flag_clear_explicit # movl $LC0, (%esp) #, movl %esi, 4(%esp) # __r, call _printf # (...) .def ___atomic_flag_for_address; .scl 2; .type 32; .endef .def ___atomic_flag_wait_explicit; .scl 2; .type 32; .endef .def _atomic_flag_clear_explicit; .scl 2; .type 32; .endef

......并且提到的功能看起来像像这样objdump：

004013c4 <__atomic_flag_for_address>: mov 0x4(%esp),%edx mov %edx,%ecx shr $0x2,%ecx mov %edx,%eax shl $0x4,%eax add %ecx,%eax add %edx,%eax mov %eax,%ecx shr $0x7,%ecx mov %eax,%edx shl $0x5,%edx add %ecx,%edx add %edx,%eax mov %eax,%edx shr $0x11,%edx add %edx,%eax and $0xf,%eax add $0x405020,%eax ret

其他的更简单，但是我找不到一条真正原子的指令（除了在X86上原子的一些虚假xchg，但这些似乎是相当于NOP / padding，因为它是xchg %ax,%ax之后的ret。

我完全不确定需要这么复杂的功能，以及它是如何制造任何原子的。

Answer 1

这是一个不合适的编译器构建。

检查你的c++config.h，它看起来像这样，但它没有：

/* Define if builtin atomic operations for bool are supported on this host. */
#define _GLIBCXX_ATOMIC_BUILTINS_1 1

/* Define if builtin atomic operations for short are supported on this host.
   */
#define _GLIBCXX_ATOMIC_BUILTINS_2 1

/* Define if builtin atomic operations for int are supported on this host. */
#define _GLIBCXX_ATOMIC_BUILTINS_4 1

/* Define if builtin atomic operations for long long are supported on this
   host. */
#define _GLIBCXX_ATOMIC_BUILTINS_8 1

这些宏的定义与否取决于configure测试，这些测试检查主机支持__sync_XXX功能。这些测试位于libstdc++v3/acinclude.m4，AC_DEFUN([GLIBCXX_ENABLE_ATOMIC_BUILTINS] ...。

在您的安装中，MEM[(__i_type *)&a]在汇编文件中-fverbose-asm显示编译器使用来自atomic_0.h的宏，例如：

#define _ATOMIC_LOAD_(__a, __x)                        \
  ({typedef __typeof__(_ATOMIC_MEMBER_) __i_type;                          \
    __i_type* __p = &_ATOMIC_MEMBER_;                      \
    __atomic_flag_base* __g = __atomic_flag_for_address(__p);          \
    __atomic_flag_wait_explicit(__g, __x);                 \
    __i_type __r = *__p;                           \
    atomic_flag_clear_explicit(__g, __x);                      \
    __r; })

使用正确构建的编译器，使用示例程序，c++ -m32 -std=c++0x -S -O2 -march=core2 -fverbose-asm应该生成如下内容：

movl    $5, 28(%esp)    #, a.D.5442._M_i
lock addl   $1, 28(%esp)    #,
mfence
movl    28(%esp), %eax  # MEM[(const struct __atomic_base *)&a].D.5442._M_i, __ret
mfence
movl    $.LC0, (%esp)   #,
movl    %eax, 4(%esp)   # __ret,
call    printf  #

Answer 2

有两种实现方式。使用__sync原语的原语和不使用原语的原语。加上只使用其中一些原语的两者的混合物。选择哪个取决于宏_GLIBCXX_ATOMIC_BUILTINS_1，_GLIBCXX_ATOMIC_BUILTINS_2，_GLIBCXX_ATOMIC_BUILTINS_4和_GLIBCXX_ATOMIC_BUILTINS_8。

混合实现至少需要第一个，完全原子的都需要。 seems是否定义它们取决于目标机器（它们可能没有为-mi386定义，应该为-mi686定义。）

为什么GCC std :: atomic增量会产生低效的非原子组装？

2 个答案: