Question

我有一个任务调度代码，我想与基线进行比较，基本上为每个任务创建一个新的pthread（我知道这不是一个好主意，但这就是为什么这只是比较的基线）。但是，出于某种原因，pthreads版本不断给我OS X ¹上的段错误，但是当我尝试在Linux ²上运行相同的代码时，一切正常。

在OS X上，它偶尔会成功完成，但通常会在pthread_create中出现段错误，有时会在pthread_join中出现段错误。我还发现，如果我致电pthread_create提供PTHREAD_CREATE_DETACHED属性，并跳过pthread_join s，那么段错误就会消失。

这个问题的底部包含一个代码的精简版本，我试图尽可能地减少代码，同时仍然会导致有问题的段错误。

我的问题如下：

为什么这会在OS X上崩溃，但在Linux上却不会崩溃？

也许有一个我忽略的错误，在Linux上恰好是良性的。我很确定互斥和CAS操作提供了足够的同步，所以我认为这不是数据竞争问题。

就像我说的，我可以通过使用PTHREAD_CREATE_DETACHED解决这个问题，但我真的很好奇segfaults的根本原因。我的感觉是，当我要求线程加入时，我目前压倒了一些系统资源限制没有得到足够快的释放，但问题是针对分离的pthreads修复的，因为它们可以立即被销毁当线程退出时;但是，我对pthread内部结构不太熟悉，无法证实/反驳我的假设。

以下是代码如何工作的概述：

我们有一堆pthread（通过wait_list_head访问）当前被阻止等待特定于线程的条件变量的信号。
main线程创建一个子线程，然后等待所有传递子项完成（通过检查active线程计数器是否达到零）。
子线程通过创建两个子线程来计算Fibonacci(N=10)和Fibonacci(N-1)来计算Fibonacci(N-2)，然后加入两个线程，将它们的结果相加并将该总和作为其返回自己的结果。这就是所有子线程的工作方式，N<2基本情况只返回N。
请注意，被阻塞的线程堆栈半随机化了哪些线程由父线程连接。也就是说，一个父母线程可能会加入其中一个兄弟姐妹的孩子，而不是加入自己的孩子;然而，由于整数加法的可交换性，最终的总和仍然是相同的。通过让每个父级加入自己的子级来消除这种“随机化”行为也可以消除段错误。
还有一个简单的纯递归Fibonacci实现（pure_fib），用于计算验证的预期答案。

这是核心行为的一些伪代码：

Fibonacci(N):
    If N < 2:
        signal_parent(N)
    Else:
        sum = 0
        pthread_create(A, Fibonacci, N-1)
        pthread_create(B, Fibonacci, N-2)
        sum += suspend_and_join_child(); // not necessarily thread A
        sum += suspend_and_join_child(); // not necessarily thread B
        signal_parent(sum)

C代码的最小工作示例包含在下面。

¹ Apple LLVM 7.0.0版（clang-700.1.76），目标：x86_64-apple-darwin14.5.0
² gcc（Ubuntu 5.4.0-6ubuntu1~16.04.2）5.4.0 20160609

#include <assert.h>
#include <pthread.h>
#include <inttypes.h>
#include <stdio.h>
#include <stdlib.h>
#include <stdbool.h>
#include <unistd.h>

#define N 10

#define RCHECK(expr)                                     \
    do {                                                 \
        int _rcheck_expr_return_value = expr;            \
        if (_rcheck_expr_return_value != 0) {            \
            fprintf(stderr, "FAILED CALL: " #expr "\n"); \
            abort();                                     \
        }                                                \
    } while (0);

typedef struct wait_state_st {
    volatile intptr_t val;
    pthread_t other;
    pthread_mutex_t lock;
    pthread_cond_t cond;
    struct wait_state_st *next;
} wait_state;

wait_state *volatile wait_list_head = NULL;
volatile int active = 0;

static inline void push_thread(wait_state *ws) {
    do {
        ws->next = wait_list_head;
    } while (!__sync_bool_compare_and_swap(&wait_list_head, ws->next, ws));
}

static inline wait_state *pop_thread(void) {
    wait_state *ws, *next;
    do {
        ws = wait_list_head;
        while (!ws) {
            usleep(1000);
            ws = wait_list_head;
        }
        next = ws->next;
    } while (!__sync_bool_compare_and_swap(&wait_list_head, ws, next));
    assert(ws->next == next); // check for ABA problem
    ws->next = NULL;
    return ws;
}

intptr_t thread_suspend(int count) {
    intptr_t sum = 0;
    // WAIT TO BE WOKEN UP "count" TIMES
    for (int i = 0; i < count; i++) {
        wait_state ws;
        ws.val = -1;
        ws.other = pthread_self();
        RCHECK(pthread_mutex_init(&ws.lock, NULL));
        RCHECK(pthread_cond_init(&ws.cond, NULL));

        RCHECK(pthread_mutex_lock(&ws.lock));

        push_thread(&ws);

        while (ws.val < 0) {
            RCHECK(pthread_cond_wait(&ws.cond, &ws.lock));
        }

        assert(ws.other != pthread_self());
        pthread_join(ws.other, NULL);

        sum += ws.val;

        RCHECK(pthread_mutex_unlock(&ws.lock));
    }
    return sum;
}

void thread_signal(intptr_t x) {
    // wake up the suspended thread
    __sync_fetch_and_add(&active, -1);
    wait_state *ws = pop_thread();
    RCHECK(pthread_mutex_lock(&ws->lock));
    ws->val = x;
    ws->other = pthread_self();
    RCHECK(pthread_cond_signal(&ws->cond));
    RCHECK(pthread_mutex_unlock(&ws->lock));
}

void *fib(void *arg) {
    intptr_t n = (intptr_t)arg;
    if (n > 1) {
        pthread_t t1, t2;
        __sync_fetch_and_add(&active, 2);
        RCHECK(pthread_create(&t1, NULL, fib, (void *)(n - 1)));
        RCHECK(pthread_create(&t2, NULL, fib, (void *)(n - 2)));
        intptr_t sum = thread_suspend(2);
        thread_signal(sum);
    }
    else {
        thread_signal(n);
    }
    return NULL;
}

intptr_t pure_fib(intptr_t n) {
    if (n < 2) return n;
    return pure_fib(n-1) + pure_fib(n-2);
}

int main(int argc, char *argv[]) {
    printf("EXPECTED = %" PRIdPTR "\n", pure_fib(N));
    assert("START" && wait_list_head == NULL);

    active = 1;

    pthread_t t;
    RCHECK(pthread_create(&t, NULL, fib, (void *)N));

    while (active > 0) { usleep(100000); }
    intptr_t sum = thread_suspend(1);

    printf("SUM      = %" PRIdPTR "\n", sum);
    printf("DONE %p\n", wait_list_head);

    assert("END" && wait_list_head == NULL);

    return 0;
}

更新： This Gist包含上述代码的略微变体，该代码使用全局互斥锁进行所有线程推送/弹出操作，从而避免了以上CAS可能存在ABA问题。此版本的代码仍然定期进行段错误，但只有大约30-50％的时间而不是99％的时间，如上面的代码。

同样，我认为当线程没有足够快地加入/销毁线程时，pthreads库耗尽资源必然会出现问题，但我不知道如何确认。

Answer 1

我看了几个小时，因为我想知道解决方案。

我发现代码正在堆栈上运行并且线程私有数据，因此它会覆盖线程ID。代码中的链表指向并使用堆栈变量的地址。代码只能工作，因为线程的时间和产生的线程数。

如果这个产生的线程少于20个，那么链表内存不会踩到其他数据，这一切都归结为内存布局和线程被杀死的方式。只要程序在被压碎的线程唤醒之前终止就可以了。

它在Linux而不是OS X上运行的原因可能是运气与内存布局和旋转usleep()循环所花费的时间相结合。

应审查在多线程应用程序中使用usleep。

许多来源都对此进行了大量讨论：

https://computing.llnl.gov/tutorials/pthreads/#Overview

https://en.wikipedia.org/wiki/ABA_problem

与W.R. Stevens，＆＃34; Unix Network Program，Vol。 1＆＃34;第23章具体。

阅读这些资源将解释为什么此代码不起作用以及它应如何工作。

为什么这个pthreads代码在OS X上一直是段错误而在Linux上不是？

我的问题如下：

1 个答案: