Question

如果我想使用SSE处理std::vector中的数据，我需要16字节对齐。我怎样才能做到这一点？我需要编写自己的分配器吗？或者默认分配器是否已经与16字节边界对齐？

Answer 1

您应该使用带有std::容器的自定义分配器，例如vector。不记得是谁写了下面的那个，但我用了一段时间它似乎工作（你可能需要将_aligned_malloc更改为_mm_malloc，具体取决于编译器/平台）：

#ifndef ALIGNMENT_ALLOCATOR_H
#define ALIGNMENT_ALLOCATOR_H

#include <stdlib.h>
#include <malloc.h>

template <typename T, std::size_t N = 16>
class AlignmentAllocator {
public:
  typedef T value_type;
  typedef std::size_t size_type;
  typedef std::ptrdiff_t difference_type;

  typedef T * pointer;
  typedef const T * const_pointer;

  typedef T & reference;
  typedef const T & const_reference;

  public:
  inline AlignmentAllocator () throw () { }

  template <typename T2>
  inline AlignmentAllocator (const AlignmentAllocator<T2, N> &) throw () { }

  inline ~AlignmentAllocator () throw () { }

  inline pointer adress (reference r) {
    return &r;
  }

  inline const_pointer adress (const_reference r) const {
    return &r;
  }

  inline pointer allocate (size_type n) {
     return (pointer)_aligned_malloc(n*sizeof(value_type), N);
  }

  inline void deallocate (pointer p, size_type) {
    _aligned_free (p);
  }

  inline void construct (pointer p, const value_type & wert) {
     new (p) value_type (wert);
  }

  inline void destroy (pointer p) {
    p->~value_type ();
  }

  inline size_type max_size () const throw () {
    return size_type (-1) / sizeof (value_type);
  }

  template <typename T2>
  struct rebind {
    typedef AlignmentAllocator<T2, N> other;
  };

  bool operator!=(const AlignmentAllocator<T,N>& other) const  {
    return !(*this == other);
  }

  // Returns true if and only if storage allocated from *this
  // can be deallocated from other, and vice versa.
  // Always returns true for stateless allocators.
  bool operator==(const AlignmentAllocator<T,N>& other) const {
    return true;
  }
};

#endif

像这样使用它（如果需要，将16更改为另一个对齐方式）：

std::vector<T, AlignmentAllocator<T, 16> > bla;

但是，这只能确保内存块std::vector使用16字节对齐。如果sizeof(T)不是16的倍数，则某些元素将不会对齐。根据您的数据类型，这可能不是问题。如果T是int（4个字节），则只加载索引为4的倍数的元素。如果是double（8个字节），则只有2的倍数等等。

真正的问题是如果你使用类T，在这种情况下你必须在类本身中指定你的对齐要求（同样，根据编译器，这可能是不同的;例子是GCC ）：

class __attribute__ ((aligned (16))) Foo {
    __attribute__ ((aligned (16))) double u[2];
};

我们差不多完成了！如果您使用 Visual C ++ （至少是2010版），由于std::vector，您将无法使用std::vector::resize指定了您指定对齐的类。< / p>

编译时，如果收到以下错误：

C:\Program Files\Microsoft Visual Studio 10.0\VC\include\vector(870):
error C2719: '_Val': formal parameter with __declspec(align('16')) won't be aligned

您必须破解stl::vector header文件：

找到vector标题文件[C：\ Program Files \ Microsoft Visual Studio 10.0 \ VC \ include \ vector]
找到void resize( _Ty _Val )方法[VC2010上的第870行]
将其更改为void resize( const _Ty& _Val )。

Answer 2

C ++标准要求分配函数（malloc()和operator new()）来为任何标准类型分配适当对齐的内存。由于这些函数没有作为参数接收对齐要求，因此在实践中它意味着所有分配的对齐方式是相同的，并且是标准类型与最大对齐要求的对齐，通常为long double并且/或long long（请参阅boost max_align union）。

向量指令（如SSE和AVX）具有比标准C ++分配函数提供的更强的对齐要求（对于128位访问进行16字节对齐，对于256位访问进行32字节对齐）。 posix_memalign()或memalign()可用于满足更强对齐要求的此类分配。

Answer 3

您可以std::vector使用suggested before，而不是像boost::alignment::aligned_allocator一样编写自己的分配器：

#include <vector>
#include <boost/align/aligned_allocator.hpp>

template <typename T>
using aligned_vector = std::vector<T, boost::alignment::aligned_allocator<T, 16>>;

Answer 4

编写自己的分配器。 allocate和deallocate是重要的。这是一个例子：

pointer allocate( size_type size, const void * pBuff = 0 )
{
    char * p;

    int difference;

    if( size > ( INT_MAX - 16 ) )
        return NULL;

    p = (char*)malloc( size + 16 );

    if( !p )
        return NULL;

    difference = ( (-(int)p - 1 ) & 15 ) + 1;

    p += difference;
    p[ -1 ] = (char)difference;

    return (T*)p;
}

void deallocate( pointer p, size_type num )
{
    char * pBuffer = (char*)p;

    free( (void*)(((char*)p) - pBuffer[ -1 ] ) );
}

Answer 5

简答：

如果sizeof(T)*vector.size() > 16则为是。
_{假设你的vector使用普通分配器}

警告：只要alignof(std::max_align_t) >= 16，因为这是最大对齐。

长答案：

更新了25 / Aug / 2017新标准n4659

如果对齐大于16的任何内容，它也会正确对齐16。

6.11对齐（第4/5段）

对齐表示为std :: size_t类型的值。有效对齐仅包括由基本类型的alignof表达式返回的值以及可能为空的其他实现定义的值集。每个对齐值应为2的非负整数幂。

对齐有从弱到强或更严格的对齐的顺序。更严格的对齐具有更大的对齐值。满足对齐要求的地址也满足任何较弱的有效对齐要求。

new和new []返回值，这些值已对齐，以便对象的大小正确对齐：

8.3.4新（第17段）

[注意：当分配函数返回null以外的值时，它必须是指向已保留对象空间的存储块的指针。假设存储块被适当地对准并且具有所请求的大小。如果对象是数组，则创建对象的地址不一定与块的地址相同。 - 结束说明]

注意大多数系统都具有最大对齐。动态分配的内存不需要与大于此值的值对齐。

6.11对齐（第2段）

基本对齐由小于或等于所支持的最大对齐的对齐表示通过在所有上下文中的实现，它等于alignof（std :: max_align_t）（21.2）。对齐当一个类型用作完整对象的类型并且当它被用作时，它可能是不同的子对象的类型。

因此，只要分配的向量存储器大于16字节，它就会在16字节边界上正确对齐。

Answer 6

按照英特尔矢量化教程http://d3f8ykwhia686p.cloudfront.net/1live/intel/CompilerAutovectorizationGuide.pdf

中的说明使用declspec(align(x,y))

Answer 7

不要假设有关STL容器的任何信息。他们的界面/行为是定义的，但不是它们背后的内容。如果您需要原始访问权限，则必须编写自己的实现，遵循您希望的规则。

Answer 8

标准要求new和new[]返回与任何数据类型对齐的数据，其中应包括SSE。 MSVC是否真正遵循该规则是另一个问题。

矢量数据如何对齐？

8 个答案:

简答：

长答案：

6.11对齐（第4/5段）

8.3.4新（第17段）

6.11对齐（第2段）