我有以下代码,它从一开始就开始多个线程(一个线程池)(startWorkers()
)。随后,在某些时候,我有一个装满myWorkObject
个实例的容器,我想要同时使用多个工作线程进行处理。在内存使用方面,myWorkObject
与另一个完全隔离。现在让我们假设myWorkObject有一个方法doWorkIntenseStuffHere()
,需要一些CPU时间来计算。
在对以下代码进行基准测试时,我注意到此代码不能很好地扩展线程数,并且初始化/同步工作线程的开销超过了多线程的好处,除非有3-4个线程处于活动状态。我已经研究了这个问题并阅读了关于错误共享问题的内容,并且我认为我的代码遇到了这个问题。但是,我想调试/配置我的代码,看看是否存在某种饥饿/虚假共享。我怎样才能做到这一点?请随意批评我的代码,因为我还在学习很多关于内存/ CPU和特别是多线程的知识。
#include <boost/thread.hpp>
class MultiThreadedFitnessProcessingStrategy
{
public:
MultiThreadedFitnessProcessingStrategy(unsigned int numWorkerThreads):
_startBarrier(numWorkerThreads + 1),
_endBarrier(numWorkerThreads + 1),
_started(false),
_shutdown(false),
_numWorkerThreads(numWorkerThreads)
{
assert(_numWorkerThreads > 0);
}
virtual ~MultiThreadedFitnessProcessingStrategy()
{
stopWorkers();
}
void startWorkers()
{
_shutdown = false;
_started = true;
for(unsigned int i = 0; i < _numWorkerThreads;i++)
{
boost::thread* workerThread = new boost::thread(
boost::bind(&MultiThreadedFitnessProcessingStrategy::workerTask, this,i)
);
_threadQueue.push_back(new std::queue<myWorkObject::ptr>());
_workerThreads.push_back(workerThread);
}
}
void stopWorkers()
{
_startBarrier.wait();
_shutdown = true;
_endBarrier.wait();
for(unsigned int i = 0; i < _numWorkerThreads;i++)
{
_workerThreads[i]->join();
}
}
void workerTask(unsigned int id)
{
//Wait until all worker threads have started.
while(true)
{
//Wait for any input to become available.
_startBarrier.wait();
bool queueEmpty = false;
std::queue<SomeClass::ptr >* myThreadq(_threadQueue[id]);
while(!queueEmpty)
{
SomeClass::ptr myWorkObject;
//Make sure queue is not empty,
//Caution: this is necessary if start barrier was triggered without queue input (e.g., shutdown) , which can happen.
//Do not try to be smart and refactor this without knowing what you are doing!
queueEmpty = myThreadq->empty();
if(!queueEmpty)
{
chromosome = myThreadq->front();
assert(myWorkObject);
myThreadq->pop();
}
if(myWorkObject)
{
myWorkObject->doWorkIntenseStuffHere();
}
}
//Wait until all worker threads have synchronized.
_endBarrier.wait();
if(_shutdown)
{
return;
}
}
}
void doWork(const myWorkObject::chromosome_container &refcontainer)
{
if(!_started)
{
startWorkers();
}
unsigned int j = 0;
for(myWorkObject::chromosome_container::const_iterator it = refcontainer.begin();
it != refcontainer.end();++it)
{
if(!(*it)->hasFitness())
{
assert(*it);
_threadQueue[j%_numWorkerThreads]->push(*it);
j++;
}
}
//Start Signal!
_startBarrier.wait();
//Wait for workers to be complete
_endBarrier.wait();
}
unsigned int getNumWorkerThreads() const
{
return _numWorkerThreads;
}
bool isStarted() const
{
return _started;
}
private:
boost::barrier _startBarrier;
boost::barrier _endBarrier;
bool _started;
bool _shutdown;
unsigned int _numWorkerThreads;
std::vector<boost::thread*> _workerThreads;
std::vector< std::queue<myWorkObject::ptr >* > _threadQueue;
};
答案 0 :(得分:1)
基于抽样的分析可以让您很好地了解自己是否经历过虚假分享。这里有previous thread,介绍了解决问题的几种方法。我不认为该主题提到了Linux的perf utility。它是一种快速,简单且免费的方法来计算缓存未命中数,可能会告诉您需要知道的内容(我是否遇到了大量缓存未命中,这与我访问特定变量的次数有关? )。
如果您确实发现您的线程方案可能导致很多冲突未命中,您可以尝试声明myWorkObject实例或其中包含的您__attribute__((aligned(64)))
实际关注的数据(对齐)到64字节缓存行)。
答案 1 :(得分:1)
如果你在Linux上,有一个名为valgrind的工具,其中一个模块正在进行缓存效果模拟(cachegrind)。请看一下