Question

我正在开发单个生产者单个消费者环缓冲区实现。我有两个要求：

1）将单个堆分配的环形缓冲区实例与高速缓存行对齐。

2）将环形缓冲区中的字段与高速缓存行对齐（以防止错误共享）。

我的班级看起来像：

#define CACHE_LINE_SIZE 64  // To be used later.

template<typename T, uint64_t num_events>
class RingBuffer {  // This needs to be aligned to a cache line.
public:
  ....

private:
  std::atomic<int64_t> publisher_sequence_ ;
  int64_t cached_consumer_sequence_;
  T* events_;
  std::atomic<int64_t> consumer_sequence_;  // This needs to be aligned to a cache line.

};

让我先解决第1点，即对齐类的单个堆分配实例。有几种方法：

1）使用c ++ 11 alignas(..)说明符：

template<typename T, uint64_t num_events>
class alignas(CACHE_LINE_SIZE) RingBuffer {
public:
  ....

private:
  // All the private fields.

};

2）使用posix_memalign(..) +展示位置new(..)而不更改类定义。这不受平台独立的影响：

 void* buffer;
 if (posix_memalign(&buffer, 64, sizeof(processor::RingBuffer<int, kRingBufferSize>)) != 0) {
   perror("posix_memalign did not work!");
   abort();
 }
 // Use placement new on a cache aligned buffer.
 auto ring_buffer = new(buffer) processor::RingBuffer<int, kRingBufferSize>();

3）使用GCC / Clang扩展__attribute__ ((aligned(#)))

template<typename T, uint64_t num_events>
class RingBuffer {
public:
  ....

private:
  // All the private fields.

} __attribute__ ((aligned(CACHE_LINE_SIZE)));

4）我尝试使用C ++ 11标准化aligned_alloc(..)函数而不是posix_memalign(..)，但Ubuntu 12.04上的GCC 4.8.1无法在stdlib.h中找到定义

所有这些都保证做同样的事吗？我的目标是缓存行对齐，所以任何对齐都有限制的方法（比如双字）都行不通。指向使用标准化alignas(..)的平台独立性是次要目标。

我不清楚alignas(..)和__attribute__((aligned(#)))是否有某些限制可能低于机器上的缓存行。我不能再重现这个了，但是在打印地址时我想我并不总是得到alignas(..)的64字节对齐地址。相反，posix_memalign(..)似乎始终有效。我再也无法重现这一点，所以也许我犯了一个错误。

第二个目标是将类/结构中的字段与高速缓存行对齐。我这样做是为了防止误共享。我尝试了以下方法：

1）使用C ++ 11 alignas(..)说明符：

template<typename T, uint64_t num_events>
class RingBuffer {  // This needs to be aligned to a cache line.
  public:
  ...
  private:
    std::atomic<int64_t> publisher_sequence_ ;
    int64_t cached_consumer_sequence_;
    T* events_;
    std::atomic<int64_t> consumer_sequence_ alignas(CACHE_LINE_SIZE);
};

2）使用GCC / Clang扩展__attribute__ ((aligned(#)))

template<typename T, uint64_t num_events>
class RingBuffer {  // This needs to be aligned to a cache line.
  public:
  ...
  private:
    std::atomic<int64_t> publisher_sequence_ ;
    int64_t cached_consumer_sequence_;
    T* events_;
    std::atomic<int64_t> consumer_sequence_ __attribute__ ((aligned (CACHE_LINE_SIZE)));
};

在对象开始之后，这两种方法似乎都将consumer_sequence对齐到64字节的地址，因此consumer_sequence是否高速缓存对齐取决于对象本身是否与缓存对齐。我的问题是 - 有没有更好的方法来做同样的事情？

修改对齐我没有在我的机器上工作的原因是我在eglibc 2.15（Ubuntu 12.04）上。它适用于eglibc的更高版本。

来自man page：The function aligned_alloc() was added to glibc in version 2.16。

这对我来说很无用，因为我不需要这样一个最新版本的eglibc / glibc。

Answer 1

不幸的是，我发现最好的是分配额外的空间，然后使用“对齐”部分。因此，RingBuffer new可以请求额外的64个字节，然后返回其中第一个64字节对齐的部分。它浪费了空间，但会给你所需的对齐。您可能需要在返回到实际的alloc地址之前设置内存以取消分配它。

[Memory returned][ptr to start of memory][aligned memory][extra memory]

（假设没有来自RingBuffer的继承）类似于：

void * RingBuffer::operator new(size_t request)
{
     static const size_t ptr_alloc = sizeof(void *);
     static const size_t align_size = 64;
     static const size_t request_size = sizeof(RingBuffer)+align_size;
     static const size_t needed = ptr_alloc+request_size;

     void * alloc = ::operator new(needed);
     void *ptr = std::align(align_size, sizeof(RingBuffer),
                          alloc+ptr_alloc, request_size);

     ((void **)ptr)[-1] = alloc; // save for delete calls to use
     return ptr;  
}

void RingBuffer::operator delete(void * ptr)
{
    if (ptr) // 0 is valid, but a noop, so prevent passing negative memory
    {
           void * alloc = ((void **)ptr)[-1];
           ::operator delete (alloc);
    }
}

对于使RingBuffer的数据成员也是64字节对齐的第二个要求，如果您知道this的开头已对齐，则可以填充以强制对齐数据成员

Answer 2

您的问题的答案是std::aligned_storage。它可以用于顶级和一个类的个别成员。

Answer 3

经过一些研究，我的想法是：

1）就像@TemplateRex指出的那样，似乎没有标准的方法来对齐超过16个字节。因此，即使我们使用标准化的alignas(..)，也不能保证除非对齐边界小于或等于16个字节。我必须验证它是否在目标平台上按预期工作。

2）__attribute ((aligned(#)))或alignas(..)不能用于对齐堆分配的对象，因为我怀疑new()对这些注释没有做任何事情。它们似乎适用于静态对象或堆栈分配与（1）中的警告。

posix_memalign(..)（非标准）或aligned_alloc(..)（标准化但无法在GCC 4.8.1上运行）+展示new(..)似乎是解决方案。当我需要与平台无关的代码时，我的解决方案是编译器特定的宏：）

3）结构/类字段的对齐似乎适用于__attribute ((aligned(#)))和alignas()，如答案中所述。我再次认为（1）关于对齐保证的注意事项。

所以我当前的解决方案是使用posix_memalign(..) + placement new(..)来对齐我的类的堆分配实例，因为我的目标平台现在只是Linux。我也使用alignas(..)来对齐字段，因为它是标准化的，至少适用于Clang和GCC。如果有更好的答案，我会很乐意改变它。

Answer 4

我不知道这是用新运算符分配内存的最佳方法，但它肯定非常简单！

这是在GCC 6.1.0中的线程清理程序传递中完成的方式

#define ALIGNED(x) __attribute__((aligned(x)))

static char myarray[sizeof(myClass)] ALIGNED(64) ;
var = new(myarray) myClass;

好吧，在sanitizer_common / sanitizer_internal_defs.h中，它也写了

// Please only use the ALIGNED macro before the type.
// Using ALIGNED after the variable declaration is not portable!

所以我不知道为什么在变量声明之后使用ALIGNED。但这是另一个故事。

在C ++ 11中对齐内存的推荐方法是什么？

4 个答案: