Question

在现有多线程应用程序的上下文中，我想暂停特定持续时间的线程列表，然后恢复正常执行。我知道有些人会说我不应该这样做，但我知道，我没有选择。

我提出了以下代码，但是随机失败了。对于我想要暂停的每个线程，我发送一个信号并通过信号量等待确认。调用时的信号处理程序，发布信号量并在指定的持续时间内休眠。

问题是当系统完全加载时，对sem_timedwait的调用有时会因ETIMEDOUT而失败而且我留下了与用于ack的信号量不一致的逻辑：我不知道信号是否已被丢弃或只是晚。

// compiled with: gcc main.c -o test -pthread

#include <pthread.h>
#include <stdio.h>
#include <signal.h>
#include <errno.h>
#include <string.h>
#include <stdlib.h>
#include <unistd.h>
#include <semaphore.h>
#include <sys/types.h>
#include <sys/syscall.h>

#define NUMTHREADS 40
#define SUSPEND_SIG (SIGRTMIN+1)
#define SUSPEND_DURATION  80 // in ms

static sem_t sem;

void checkResults(const char *msg, int rc) {
    if (rc == 0) {
        //printf("%s success\n", msg);
    } else if (rc == ESRCH) {
        printf("%s failed with ESRCH\n", msg);
    } else if (rc == EINVAL) {
        printf("%s failed with EINVAL\n", msg);
    } else {
        printf("%s failed with unknown error: %d\n", msg, rc);
    }
}

static void suspend_handler(int signo) {
    sem_post(&sem);
    usleep(SUSPEND_DURATION*1000);
}

void installSuspendHandler() {
    struct sigaction sa;

    memset(&sa, 0, sizeof(sa));

    sigemptyset(&sa.sa_mask);

    sa.sa_flags = 0;
    sa.sa_handler = suspend_handler;

    int rc = sigaction(SUSPEND_SIG, &sa, NULL);
    checkResults("sigaction SUSPEND", rc);
}

void *threadfunc(void *param) {
    int tid = *((int *) param);
    free(param);

    printf("Thread %d entered\n", tid);

    // this is an example workload, the real app is doing many things
    while (1) {
        int rc = sleep(30);

        if (rc != 0 && errno == EINTR) {
            //printf("Thread %d got a signal delivered to it\n", tid);
        } else {
            //printf("Thread %d did not get expected results! rc=%d, errno=%d\n", tid, rc, errno);
        }
    }

    return NULL;
}

int main(int argc, char **argv) {
    pthread_t threads[NUMTHREADS];
    int i;

    sem_init(&sem, 0, 0);

    installSuspendHandler();

    for(i=0; i<NUMTHREADS; ++i) {
        int *arg = malloc(sizeof(*arg));
        if ( arg == NULL ) {
            fprintf(stderr, "Couldn't allocate memory for thread arg.\n");
            exit(EXIT_FAILURE);
        }

        *arg = i;
        int rc = pthread_create(&threads[i], NULL, threadfunc, arg);
        checkResults("pthread_create()", rc);
    }

    sleep(3);

    printf("Will start to send signals...\n");

    while (1) {
        printf("***********************************************\n");
        for(i=0; i<NUMTHREADS; ++i) {
            int rc = pthread_kill(threads[i], SUSPEND_SIG);
            checkResults("pthread_kill()", rc);

            printf("Waiting for Semaphore for thread %d ...\n", i);

            // compute timeout abs timestamp for ack
            struct timespec ts;
            clock_gettime(CLOCK_REALTIME, &ts);
            const int TIMEOUT = SUSPEND_DURATION*1000*1000; // in nano-seconds

            ts.tv_nsec += TIMEOUT; // timeout to receive ack from signal handler

            // normalize timespec
            ts.tv_sec += ts.tv_nsec / 1000000000;
            ts.tv_nsec %= 1000000000;

            rc = sem_timedwait(&sem, &ts); // try decrement semaphore

            if (rc == -1 && errno == ETIMEDOUT) {
                // timeout
                // semaphore is out of sync
                printf("Did not received signal handler sem_post before timeout of %d ms for thread %d", TIMEOUT/1000000, i);
                abort();
            }
            checkResults("sem_timedwait", rc);
            printf("Received Semaphore for thread %d.\n", i);
        }

        sleep(1);
    }

    for(i=0; i<NUMTHREADS; ++i) {
        int rc = pthread_join(threads[i], NULL);
        checkResults("pthread_join()\n", rc);
    }
    printf("Main completed\n");
    return 0;
}

有问题吗？

是否有可能丢弃信号并且从未发送信号？
系统加载时随机时间信号量超时的原因是什么？

Answer 1

usleep()不在异步信号安全函数中（虽然sleep()是，并且还有其他异步信号安全函数，您可以通过它们产生定时延迟）。因此，从信号处理程序调用{{1}}的程序是不符合的。规范没有描述可能发生的事情 - 既没有这样的调用本身也没有描述它发生的更大的程序执行。只有符合程序才能回答您的问题;我在下面这样做。

是否有可能丢弃信号并且从未发送信号？

这取决于你究竟是什么意思：

如果将正常（非实时）信号传递给已排队该信号的线程，则不会对其他实例进行排队。
一个线程可能会因为仍然排队的信号而死掉;那些信号将无法处理。
线程可以更改给定信号的处置（例如，usleep()），尽管这是一个每进程属性，而不是每个线程属性。
线程可以无限期阻止信号。被阻止的信号不会被丢弃 - 它仍然排队等待线程，并且最终会在解除阻塞后的某个时间被接收，如果发生的话。

但不是，在通过SIG_IGN或kill()函数成功排队信号后，该信号不会随机丢弃。

系统加载时随机时间信号量超时的原因是什么？

线程只有在核心上实际运行时才能接收信号。在具有比核心更可运行的进程的系统上，必须在任何给定时间暂停某些可运行进程，而不在任何核心上使用时间片。在负载很重的系统上，这是常态。信号是异步的，因此您可以将一个信号发送到当前正在等待时间片而没有发送方阻塞的线程。那么，完全可能的是，您发出信号的线程在超时到期之前没有被安排运行。如果它确实运行，它可能由于某种原因而阻塞了信号，并且在它耗尽其时间片之前无法解除阻塞。

最终，您可以使用基于信号量的方法来检查目标线程是否在您选择的任何超时内处理信号，但您无法预先预测线程处理信号所需的时间，甚至也不能是否会在任何有限的时间内这样做（例如，在这样做之前它可能因某种原因而死）。

如何使用posix信号正确挂起多个线程？

1 个答案: