Question

我目前正在使用boost 1.55.0，我无法理解为什么这段代码不起作用。

以下代码是与我的程序具有相同问题的简化。小跑完成，但当它们更大时，线程会一直等待。

boost::mutex m1;
boost::mutex critical_sim;

int total= 50000;

class krig{

public:

    float dokrig(int in,float *sim, bool *aux, boost::condition_variable *hEvent){

        float simnew=0;

        boost::mutex::scoped_lock lk(m1);

        if (in > 0)
        {
            while(!aux[in-1]){
                hEvent[in-1].wait(lk);  
            }
            simnew=1+sim[in-1];

        }

        return simnew;
    };

};

void Simulnode( int itrd,float *sim, bool *aux, boost::condition_variable *hEvent){
    int j;
    float simnew;

    krig kriga;

    for(j=itrd; j<total; j=j+2){

        if (fmod(1000.*j,total) == 0.0){
            printf (" .progress. %f%%\n",100.*(float)j/(float)total);
        }

        simnew= kriga.dokrig(j,sim, aux, hEvent);

        critical_sim.lock();
        sim[j]=simnew;
        critical_sim.unlock();

        aux[j]=true;
        hEvent[j].notify_one();
    }
}


int main(int argc, char* argv[])
{
    int i;
    float *sim = new float[total];

    bool *aux = new bool[total];

    for(i=0; i<total; ++i)
        aux[i]=false;

//boost::mutex m1;

    boost::condition_variable *hEvent = new boost::condition_variable[total];

    boost::thread_group tgroup;
    for(i=0; i<2; ++i) {
        tgroup.add_thread(new boost::thread(Simulnode, i,sim, aux, hEvent));

    }
    tgroup.join_all();

    return 0;
}

奇怪的是，我注意到如果我将内置dokrig（）内部的代码放入simulnode（）内联，那么它似乎可行。这可能是锁的范围有问题吗？

谁能告诉我哪里错了？提前谢谢。

Answer 1

问题出现在这一部分：

aux[j]=true;
hEvent[j].notify_one();

第一行表示由hEvent条件变量监视的条件的更改。第二行宣布对消费者部分进行此更改，即等待该条件成为现实。

问题是这两个步骤在没有与消费者同步的情况下发生，这可能导致以下竞争：

消费者检查当前为假的情况。这发生在受互斥m1保护的关键部分。
发生线程切换。生产者将条件更改为true并通知任何等待的消费者。
线程切换回来。消费者恢复并呼叫等待。但是，他已经错过了最后一步发生的通知，所以他将永远等待。

重要的是要理解传递给条件变量的wait调用的互斥锁的目的不是保护条件变量本身，而是保护它监视的条件（在这种情况下是对aux的更改）。

为了避免数据竞争，写入aux和后续通知必须受同一个互斥锁的保护：

{
    boost::lock_guard<boost::mutex> lk(m1);
    aux[j]=true;
    hEvent[j].notify_one();
}

与boost :: condition_variable进行线程同步

1 个答案: