Question

我的输出显示线程1和2比其他线程更优先。我的实现如下。

#include(pthread.h)
#include(stdio.h>
#include(unistd.h>
#include (assert.h>
volatile int NUM_THREADS = 10;
volatile int Number[10] = {0};
volatile int count_cs[10] = {0};
volatile int Entering[10] = {0};

int max()
{
    int i = 0;
    int j = 0;
    int maxvalue = 0;
    for(i = 0; i < 10; i++)
    {
        if ((Number[i]) > maxvalue)
        {
              maxvalue = Number[i];
        }
    }
    return maxvalue;
}

lock(int i)
{
    int j;
    Entering[i] = 1;
    Number[i] = 1 + max();
    Entering[i] = 0;
    for (j = 1; j <= NUM_THREADS; j++)
    {
        while (Entering[j]) { } /* Do nothing */
        while ((Number[j] != 0) &&
               ((Number[j] < Number[i]) ||
                ((Number[j] == Number[i]) && (j < i)))) { }
    }
}

unlock(int i) {
    Number[i] = 0;
}

void Thread(int i) {
   while (1) {
       lock(i);
       count_cs[i+1] = count_cs[i+1] + 1 ;
       //printf("critical section of %d\n", i+1);
       unlock(i);
   }
}

int main()
{
   int duration = 10000;
   pthread_t threads[NUM_THREADS];
   int rc;
   long t;
   for(t = 0; t < NUM_THREADS; t++){
       printf("In main: creating thread %ld\n", t+1);
       rc = pthread_create(&threads[t], NULL, Thread, (int)t);
       if (rc){
           printf("ERROR; return code from pthread_create() is %d\n", rc);
           exit(-1);
        }
   }
   usleep(duration*1000);
   for(t=0; t < NUM_THREADS; t++)
    {
    printf("count of thread no %d is %d\n",t+1,count_cs[t+1]);
    }
   return 0;
}

如果我在关键部分打印一些值，我得到的所有线程的计数数量几乎相等。为什么我会在输出中获得这种变化？

关键部分中没有打印语句的输出：

count of thread no 1 is 551013
count of thread no 2 is 389269
count of thread no 3 is 3
count of thread no 4 is 3
count of thread no 5 is 3
count of thread no 6 is 3
count of thread no 7 is 3

count of thread no 8 is 3
count of thread no 9 is 3
count of thread no 10 is 3

在关键部分输出打印语句：

count of thread no 1 is 5
count of thread no 2 is 6
count of thread no 3 is 5
count of thread no 4 is 5
count of thread no 5 is 5
count of thread no 6 is 5
count of thread no 7 is 4
count of thread no 8 is 4
count of thread no 9 is 4
count of thread no 10 is 4

为了避免内存模型出现问题，我将线程限制在一个CPU上并使用taskset 0x00000001 ./a.out在Linux上运行我的程序。

Answer 1

这有几个问题。

首先，pthread_create需要花费大量时间：肯定比快速锁定/增量计数/解锁迭代要多得多。因此，第一个线程比其他线程具有更大的优势，因为它首先运行，而第二个线程获得较小的优势，等等。当您将printf粘贴在循环中时，这会减慢线程，因此优势更小。

在相关的说明中，仅仅因为pthread_create已经返回，该线程不一定已经启动。它只是意味着调度程序现在会考虑它。

第三，你的锁实现是一个繁忙的等待循环。因此，无论运行哪个线程，它都将占用所有可用的CPU时间。由于您在单个核心上运行代码，如果拥有锁定的线程被挂起，那么其他线程将花费所有时间片段进行忙碌等待，然后带锁定的线程可以恢复，解锁，尝试并采取再次锁定。

最后，在争用锁的情况下，该算法优先考虑具有最小编号的线程，因此线程0将获得比其他线程更多的锁，因为所有线程正在进行忙等待，因此存在高争用。

尝试在sched_yield()的循环中添加一些lock()调用，以使具有锁定的线程更有可能运行。

Answer 2

我发现您在单个CPU上运行，因此您可以避免以下问题。不过，请记住这一点。

请注意，与Microsoft的编译器不同，GCC不会向volatile提供特殊的非标准SMP线程含义。因此，您不能依赖它来在CPU之间进行排序。这意味着，如果Number和Entering位于不同的缓存行上，则CPU-0可以自由地写入Number并且Entering出现在CPU-1上与你想象的不同的顺序。

要解决此问题，您需要使用原子操作。海湾合作委员会已经建立了这些。

线程1和2在我实施Lamport的烘焙算法时非常重要

2 个答案: