Question

该程序将创建多个线程，其中每个线程使用for循环将共享变量增加10000，在每次迭代中将其递增1。需要互斥锁和自旋锁（忙等待）版本。据我所知，互斥版应该比自旋锁更快。但我实施的却给了我相反的答案......

这是互斥锁版本中每个线程的实现：

void *incr(void *tid)
{
    int i;
    for(i = 0; i < 10000; i++)
    {
        pthread_mutex_lock(&the_mutex);     //Grab the lock
        sharedVar++;    //Increment the shared variable
        pthread_mutex_unlock(&the_mutex);   //Release the lock
    }
    pthread_exit(0);
}

这是旋转锁定版本中的实现：

void *incr(void *tid)
{
    int i;
    for(i = 0; i < 10000; i++)
    {
        enter_region((int)tid);  //Grab the lock
        sharedVar++;        //Increment the shared variable
        leave_region((int)tid);  //Release the lock
    }
    pthread_exit(0);
}
void enter_region(int tid)
{
    interested[tid] = true;     //Show this thread is interested
    turn = tid;     //Set flag
    while(turn == tid && other_interested(tid));    //Busy waiting
}
bool other_interested(int tid)    //interested[] is initialized to all false
{
    int i;
    for(i = 0; i < tNumber; i++)
        if(i != tid)
            if(interested[i] == true)   //There are other threads that are interested
                return true;
    return false;
}
void leave_region(int tid)
{
    interested[tid] = false;    //Depart from critical region
}

我还迭代了线程创建和运行数百次的过程，以确保可以区分执行时间。例如，如果tNumber为4，并且我将程序重复1000次，则互斥锁将花费2.22秒，旋转锁定将花费我1.35秒。随着tNumber的增加，差异会增大。为什么会这样？我的代码错了吗？

Answer 1

enter_region和leave_region之间的代码不受保护。

你可以通过让它变得更加复杂来证明这一点，以确保它会自行解决。

创建长度为10000的bools（check）数组设置为false。在输入和离开之间创建代码：

if (check[sharedVar]) cout << "ERROR" << endl;
else check[sharedVar++] = true;

Answer 2

速度的“差异”是您使用

实现同步

interested[tid] = true;     //Show this thread is interested
turn = tid;     //Set flag
while(turn == tid && other_interested(tid));

这是顺序操作。任何线程在执行此操作时都可以被抢占，并且下一个线程会读取错误状态。

需要通过实施compare-and-swap或test-and-set以原子方式完成。这些说明通常由硬件提供。

例如，在x86上，您有xchg, cmpxchg/cmpxchg8b, xadd
您的测试可以重写为

while( compare_and_swap_atomic(myid,id_meaning_it_is_free) == false);

问题是 原子性很贵 。

测量Mutex和Busy等待的效率

2 个答案: