条件变量

Question

我对C很陌生，所以我不确定在哪里开始挖掘我的问题。我正在尝试将python数字运算算法移植到C，并且由于C（woohoo）中没有GIL，我可以在线程中更改我想要的内存，只要我确保没有比赛。

我在互斥体上完成了我的作业，但是，如果连续运行线程一遍又一遍地访问同一个数组，我无法绕过使用互斥锁。

我正在使用p_threads来分割大型数组a[N]上的工作负载。数组a[N]上的数字运算算法是加法的，所以我使用a_diff[N_THREADS][N]数组拆分它，将每个线程应用于a[N]数组的更改写入a_diff[N_THREADS][N]然后在每一步之后将它们合并在一起。

我需要在不同版本的数组a[N]上运行运算，所以我通过全局指针p传递它们（在MWE中，只有一个a[N]）

我正在使用另一个全局数组SYNC_THREADS[N_THREADS]来同步线程，并确保线程在我需要它时通过设置END_THREADS全局来退出（我知道，我使用了太多的全局数据 - 我不知道t care，代码是~200行）。我的问题是关于这种同步技术 - 这样做是否安全以及更清洁/更好/更快的方法是什么？

MWEE：

#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>
#define N_THREADS 3
#define N 10000000
#define STEPS 3

double a[N];  // main array
double a_diff[N_THREADS][N];  // diffs array
double params[N];  // parameter used for number-crunching
double (*p)[N];  // pointer to array[N]

// structure for bounds for crunching the array
struct bounds {
    int lo;
    int hi;
    int thread_num;
};
struct bounds B[N_THREADS];
int SYNC_THREADS[N_THREADS];  // for syncing threads
int END_THREADS = 0;  // signal to terminate threads


static void *crunching(void *arg) {
    // multiple threads run number-crunching operations according to assigned low/high bounds
    struct bounds *data = (struct bounds *)arg;
    int lo = (*data).lo;
    int hi = (*data).hi;
    int thread_num = (*data).thread_num;
    printf("worker %d started for bounds [%d %d] \n", thread_num, lo, hi);

    int i;

    while (END_THREADS != 1) {  // END_THREADS tells threads to terminate
        if (SYNC_THREADS[thread_num] == 1) {  // SYNC_THREADS allows threads to start number-crunching
            printf("worker %d working... \n", thread_num );
            for (i = lo; i <= hi; ++i) {
                a_diff[thread_num][i] += (*p)[i] * params[i];  // pretend this is an expensive operation...
            }
            SYNC_THREADS[thread_num] = 0;  // thread disables itself until SYNC_THREADS is back to 1
            printf("worker %d stopped... \n", thread_num );
        }
    }
    return 0;
}


int i, j, th,s;
double joiner;

int main() {
    // pre-fill arrays
    for (i = 0; i < N; ++i) {
        a[i] = i + 0.5;
        params[i] = 0.0;
    }

    // split workload between workers
    int worker_length = N / N_THREADS;
    for (i = 0; i < N_THREADS; ++i) {
        B[i].thread_num = i;
        B[i].lo = i * worker_length;
        if (i == N_THREADS - 1) {
            B[i].hi = N;
        } else {
            B[i].hi = i * worker_length + worker_length - 1;
        }
    }
    // pointer to parameters to be passed to worker
    struct bounds **data = malloc(N_THREADS * sizeof(struct bounds*));
    for (i = 0; i < N_THREADS; i++) {
        data[i] = malloc(sizeof(struct bounds));
        data[i]->lo = B[i].lo;
        data[i]->hi = B[i].hi;
        data[i]->thread_num = B[i].thread_num;
    }
    // create thread objects
    pthread_t threads[N_THREADS];

    // disallow threads to crunch numbers
    for (th = 0; th < N_THREADS; ++th) {
        SYNC_THREADS[th] = 0;
    }

    // launch workers
    for(th = 0; th < N_THREADS; th++) {
        pthread_create(&threads[th], NULL, crunching, data[th]);
    }

    // big loop of iterations
    for (s = 0; s < STEPS; ++s) {
        for (i = 0; i < N; ++i) {
            params[i] += 1.0;  // adjust parameters

            // zero diff array
            for (i = 0; i < N; ++i) {
                for (th = 0; th < N_THREADS; ++th) {
                    a_diff[th][i] = 0.0;
                }
            }
            p = &a;  // pointer to array a
            // allow threads to process numbers and wait for threads to complete
            for (th = 0; th < N_THREADS; ++th) { SYNC_THREADS[th] = 1; }
            // ...here threads started by pthread_create do calculations...
            for (th = 0; th < N_THREADS; th++) { while (SYNC_THREADS[th] != 0) {} }

            // join results from threads (number-crunching is additive)
            for (i = 0; i < N; ++i) {
                joiner = 0.0;
                for (th = 0; th < N_THREADS; ++th) {
                    joiner += a_diff[th][i];
                }
                a[i] += joiner;
            }
        }
    }


    // join workers
    END_THREADS = 1;
    for(th = 0; th < N_THREADS; th++) {
        pthread_join(threads[th], NULL);
    }

    return 0;
}

我发现工人不会在时间上重叠：

worker 0 started for bounds [0 3333332]
worker 1 started for bounds [3333333 6666665]
worker 2 started for bounds [6666666 10000000]
worker 0 working...
worker 1 working...
worker 2 working...
worker 2 stopped...
worker 0 stopped...
worker 1 stopped...
worker 2 working...
worker 0 working...
worker 1 working...
worker 1 stopped...
worker 0 stopped...
worker 2 stopped...
worker 2 working...
worker 0 working...
worker 1 working...
worker 1 stopped...
worker 2 stopped...
worker 0 stopped...

Process returned 0 (0x0)   execution time : 1.505 s

我确保工作人员不会通过a_diff[thead_num][N]子阵列将它们分开来进入彼此的工作空间，但是，我不确定总是如此，我不会在某处引入隐藏的比赛...

Answer 1

我没有意识到问题是什么： - ）

所以，问题是您是否正在考虑使用SYNC_THREADS和END_THREADS同步机制。
是的！...差不多。问题是线程在等待时烧掉了CPU。

条件变量

要使线程等待事件，您有条件变量（pthread_cond）。这些功能提供了一些有用的功能，例如wait()，signal()和broadcast()：

wait(&cond, &m)阻止给定条件变量中的线程。 [注2]
signal(&cond)解锁在给定条件变量中等待的线程。
broadcast(&cond)解锁在给定条件变量中等待的所有线程。

最初，您所有的主题都在等待 [注1] ：

while(!start_threads)
  pthread_cond_wait(&cond_start);

并且，当主线程准备就绪时：

start_threads = 1;
pthread_cond_broadcast(&cond_start);

障碍

如果迭代之间存在数据依赖关系，那么您希望确保线程在任何给定时刻都执行相同的迭代。

要在每次迭代结束时同步线程，您需要查看障碍（pthread_barrier）：

pthread_barrier_init(count)：初始化同步count个主题的障碍。
pthread_barrier_wait()：线程在此处等待，直到所有count个线程都到达屏障。

扩展障碍的功能

有时你会希望最后一个线程到达屏障来计算某些东西（例如，增加迭代次数的计数器，或者计算一些全局值，或者检查执行是否应该停止）。你有两个选择

使用`pthread_barrier` s

你需要基本上有两个障碍：

int rc = pthread_barrier_wait(&b);
if(rc != 0 && rc != PTHREAD_BARRIER_SERIAL_THREAD)
  if(shouldStop()) stop = 1;
pthread_barrier_wait(&b);
if(stop) return;

使用`pthread_cond`来实现我们自己的专门障碍

pthread_mutex_lock(&mutex)
remainingThreads--;
// all threads execute this
executedByAllThreads();
if(remainingThreads == 0) {
  // reinitialize barrier
  remainingThreads = N;
  // only last thread executes this
  if(shouldStop()) stop = 1;
  pthread_cond_broadcast(&cond);
} else {
while(remainingThreads > 0)
  pthread_cond_wait(&cond, &mutex);
}
pthread_mutex_unlock(&mutex);

注1：为什么pthread_cond_wait()阻止了while？可能看起来有点奇怪。其背后的原因是由于存在虚假的唤醒。即使未发出signal()或broadcast()，该功能也可能会返回。所以，为了保证正确性，通常会有一个额外的变量来保证如果一个线程突然在它应该被唤醒之前，它会回到pthread_cond_wait()。

从手册：

使用条件变量时，总会有一个布尔谓词，涉及与每个条件等待关联的共享变量，如果线程应该继续，则为true。可能会发生pthread_cond_timedwait()或pthread_cond_wait()函数的虚假唤醒。由于pthread_cond_timedwait()或pthread_cond_wait()的返回并不意味着该谓词的值，因此应在返回时重新评估谓词。

（...）

如果信号被传递给等待条件变量的线程，则从信号处理程序返回时，线程将继续等待条件变量，就像它没有被中断一样，或者由于虚假唤醒而返回零。

注2：

Michael Burr在评论中指出，每当修改谓词（start_threads）和pthread_cond_wait()时，都应该保留伴侣锁。 pthread_cond_wait()会在调用时释放互斥锁;并在它返回时重新获得它。

PS：这有点晚了;抱歉，如果我的文字令人困惑： - ）

使用全局

1 个答案:

条件变量

障碍

扩展障碍的功能

使用`pthread_barrier` s

使用`pthread_cond`来实现我们自己的专门障碍

使用全局

1 个答案:

条件变量

障碍

扩展障碍的功能

使用pthread_barrier s

使用pthread_cond来实现我们自己的专门障碍

使用`pthread_barrier` s

使用`pthread_cond`来实现我们自己的专门障碍