Question

我想为非静态成员模拟线程局部变量，如下所示：

template< typename T, unsigned int tNumThread >
class ThreadLocal
{
private:

protected:
    T mData[tNumThread];

    unsigned int _getThreadIndex()
    {
        return ...; // i have a threadpool and each thread has an index from 0 to n
    }

public:
    ThreadLocal() {};
    ~ThreadLocal() {};

    T& operator ->()
    {
        return mData[_getThreadIndex()];
    }
    ...
};

但问题是线程的数量将在运行时确定，我必须从堆中分配mData。

我想知道有没有办法不使用堆分配并使用上面的常规数组？

Answer 1

每个线程都有自己的堆栈，并且在函数返回时记住堆栈帧被弹出（或者可以被认为是弹出）。

这就是为什么我们有“无堆栈python”，因为1个堆栈（python需要什么）和多个线程不能很好（参见全局解释器锁）

你可以将它放在main中，这将持续，但请记住C（++）想知道COMPILE TIME的所有大小，所以如果线程数改变（在COMP COMPILE TIME没有修复）那就没办法了知道这一点。

你真正想要的不是主要内容，但这不是一个模板（在数字中），因为在编译时无法知道该数字。

GCC提供了一个堆栈分配功能（如malloc），但是我找不到它，最好避免使用它，因为那时优化实际上有效。

同样不要低估你的CPU预读能力和GCC优化，把数组放在堆上也不错。

有趣的阅读与良好的图片，但不幸的是只与该主题有很大关系： http://www.nongnu.org/avr-libc/user-manual/malloc.html

Answer 2

我建议：

string UnicodeToGB2312(std::string uData)
{
    int len = strlen(uData.c_str()) + 1;
    char outch[MAX_PATH];
    WCHAR * wChar = new WCHAR[len];
    wChar[0] = 0;
    MultiByteToWideChar(CP_UTF8, 0, uData.c_str(), len, wChar, len);
    WideCharToMultiByte(CP_ACP, 0, wChar, len, outch, len, 0, 0);
    delete[] wChar;
    char* pchar = (char*)outch;
    return pchar;
}

要么你精确地锁定所有访问，要么提前准备人口并在以后阅读它。

您可以将它与堆栈分配器结合使用。

std::unordered_map<std::thread::id, T, stackalloc> myTLS;

https://howardhinnant.github.io/stack_alloc.html

如果您想要其他解决方案，请执行以下操作：

typedef short_alloc<pair<const thread::id, T>, maxthrds> stackalloc;

对于MESI无瓶颈访问。
您将不得不关心使用此方法在此数组中使用自己的索引跟踪线程。

模拟线程局部变量

2 个答案: