简单的线程安全和快速内存池实现?

时间:2012-09-17 21:40:38

标签: c++ multithreading performance lock-free

在重新思考设计并从paddy的一些输入后我想出了类似的东西,但我想知道它的正确性,当我运行它时似乎很好...... 这个想法是预先分配的对象继承自以下内容:

struct Node
{
    void* pool;
};

这样我们就会在每个分配的对象中注入一个指向它的池的指针,以便以后释放它。然后我们有:

template<class T, int thesize>
struct MemPool
{
    T* getNext();
    void free(T* ptr);

    struct ThreadLocalMemPool
    {
        T* getNextTL();
        void freeTL();

        int size;
        vector<T*> buffer;
        vector<int> freeList;
        int freeListIdx;
        int bufferIdx;
        ThreadLocalMemPool* nextTlPool; //within a thread's context a linked list
    };

    int size;
    threadlocal ThreadLocalMemPool* tlPool; //one of these per thread
};

所以基本上我说MemPool<Cat, 100>并且它给了我一个mempool,对于getNexts它的每个线程,它将实例化一个threadlocal mempool。尺寸我在内部圆形到最接近的2的幂,以便于模数(为简单起见,省略了)。因为getNext()是每个线程的本地,它不需要锁定,我尝试使用原子来释放部分,如下所示:

T* ThreadLocalMemPool::getNextTL()
{
    int iHead = ++bufferIdx % size;
    int iTail = freeListIdx % size;

    if (iHead != iTail)  // If head reaches tail, the free list is empty.
    {
        int & idx = freeList[iHead];
        while (idx == DIRTY) {}
        return buffer[idx];
    }
    else
    {
        bufferIdx--; //we will recheck next time
        if (nextTLPool)
            return nextTLPool->getNextTL();
        else
            //set nextTLPool to a new ThreadLocalMemPool and return getNextTL() from it..
    }
}

void ThreadLocalMemPool::free(T* ptr)
{
    //the outer struct handles calling this in the right ThreadLocalMemPool

    //we compute the index in the pool from which this pool came from by subtracting from
    //its address the address of the first pointer in this guys buffer
    int idx = computeAsInComment(ptr);

    int oldListIdx = atomic_increment_returns_old_value(freeListIdx);
    freeList[oldListIdx % size] = idx;
}

现在,我们的想法是freeListIdx总会落后于池中bufferIdx,因为你不能(我假设正确使用) 免费超过你分配。调用free将它们返回缓冲区索引的顺序同步到空闲列表 并且当循环返回时,getNext将接收它。我一直在考虑它,并没有看到任何语义错误 有了这个逻辑,它看起来是否合理,还是有一些微妙的东西可以打破它?

1 个答案:

答案 0 :(得分:3)

线程安全问题需要锁定。如果你想放松它,你需要只有一个线程使用池的约束。如果你使用我将在下面描述的循环空闲列表,你可以将它扩展到两个线程,条件是一个线程负责分配,另一个负责解除分配。

至于使用没有任何其他管理的矢量,这是一个坏主意......一旦你开始分散,你的分配就会受到影响。

实现这一点的一个好方法是只分配一个大块的T然后创建一个足够大的循环队列来指向这些块中的每一个。这是你的'免费清单'。您可能只是选择使用索引。如果将每个池限制为65536个项目,则可以选择unsigned short来节省空间(实际上,它是65535以允许有效的循环队列管理)

通过使用循环队列,无论碎片如何,都允许进行常量时间分配和释放。您还知道池何时已满(即空闲列表为空),您可以创建另一个池。显然,当您创建池时,您需要填写空闲列表。

所以你的课看起来应该是这样的:

template<class T, size_t initSize>
class MemPool
{
    vector<T> poolBuffer;              // The memory pool
    vector<unsigned short> freeList;   // Ring-buffer (indices of free items)
    unsigned short nHead, nTail;       // Ring-buffer management
    int nCount;                        // Number of elements in ring-buffer
    MemPool<T,initSize> *nextPool;     // For expanding memory pool

    // etc...
};

现在,为了锁定。如果您可以访问原子递增和递减指令并且相当小心,则可以使用线程安全性来维护自由列表。唯一需要的互斥锁式锁定是需要分配新的内存池时。

我改变了我原来的想法。你需要两个原子操作,你需要一个保留的索引值(0xffff)来为队列上的非原子操作旋转:

// I'll have the functions atomic_incr() and atomic_decr().  The assumption here
// is that they do the operation and return the value as it was prior to the
// increment/decrement.  I'll also assume they work correctly for both int and
// unsigned short types.
unsigned short atomic_incr( unsigned short & );
int atomic_incr( int & );
int atomic_decr( int & );

所以分配就像:

T* alloc()
{
    // Check the queue size.  If it's zero (or less) we need to pass on
    // to the next pool and/or allocate a new one.
    if( nCount <= 0 ) {
        return alloc_extend();
    }

    int count = atomic_decr(nCount);
    if( count <= 0 ) {
        T *mem = alloc_extend();
        atomic_incr(nCount);     // undo
        return mem;
    }

    // We are guaranteed that there is at least 1 element in the list for us.
    // This will overflow naturally to achieve modulo by 65536.  You can only
    // deal with queue sizes that are a power of 2.  If you want 32768 values,
    // for example, you must do this: head &= 0x7fff;
    unsigned short head = atomic_incr(nHead);

    // Spin until the element is valid (use a reference)
    unsigned short & idx = freeList[head];
    while( idx == 0xffff );

    // Grab the pool item, and blitz the index from the queue
    T * mem = &poolBuffer[idx];
    idx = 0xffff;

    return mem;
};

以上使用了新的私有成员函数:

T * alloc_extend()
{
    if( nextPool == NULL ) {
        acquire_mutex_here();
        if( nextPool == NULL ) nextPool = new MemPool<T>;
        release_mutex_here();
        if( nextPool == NULL ) return NULL;
    }
    return nextPool->alloc();
}

当你想要自由时:

void free(T* mem)
{
    // Find the right pool to free from.
    if( mem < &poolBuffer.front() || mem > &poolBuffer.back() )
    {
        if( nextPool ) nextPool->free(mem);
        return;
    }

    // You might want to maintain a bitset that indicates whether the memory has
    // actually been allocated so you don't corrupt your pool here, but I won't
    // do that in this example...

    // Work out the index.  Hope my pointer arithmetic is correct here.
    unsigned short idx = (unsigned short)(mem - &poolBuffer.front());

    // Push index back onto the queue.  As with alloc(), you might want to
    // use a mask on the tail to achieve modulo.
    int tail = atomic_incr(nTail);
    freeList[tail] = idx;

    // Don't need to check the list size.  We assume everything is sane. =)
    atomic_incr(nCount);
}

注意我使用值0xffff,实际上是NULL。设置,清除和旋转此值可以防止比赛情况。如果多个线程在调用free的其他线程时可能正在调用alloc,则无法保证将旧数据保留在队列中是安全的。您的队列将循环显示,但其中的数据可能尚未设置。

当然,您可以使用指针代替索引。但这是4个字节(或64位应用程序上的8个字节),并且内存开销可能不值得,具体取决于您正在汇集的数据大小。就个人而言,我会使用指针,但由于某些原因,在这个答案中使用索引似乎更容易。