Question

我正在使用posix线程在C中使用固定数量的线程编写程序。

如果线程因某些错误而被终止，我该如何得到通知？

是否有信号可以检测到它？

如果是这样，信号处理程序是否可以创建一个新线程来保持线程数相同？

Answer 1

分离线程
让他们优雅地处理错误。即关闭互斥锁，文件等......

然后你将没有任何问题。

或许向主线发出一个USR1信号，告诉它事情已经变成了梨形状（我会说出来了！）

Answer 2

通过将函数指针传递给中间函数来创建线程。异步启动该中间函数并使其同步调用传递的函数。当函数返回或抛出异常时，您可以以任何您喜欢的方式处理结果。

Answer 3

使用您提供的最新输入，我建议您执行类似的操作以获取特定进程已启动的线程数 -

#include<stdio.h>
#define THRESHOLD 50

int main ()
{
    unsigned count = 0;
    FILE *a;

    a = popen ("ps H `ps -A | grep a.out | awk '{print $1}'` | wc -l", "r");
    if (a == NULL)
        printf ("Error in executing command\n");

    fscanf(a, "%d", &count );

    if (count < THRESHOLD)
    {
        printf("Number of threads = %d\n", count-1);
            // count - 1 in order to eliminate header.
            // count - 2 if you don't want to include the main thread

        /* Take action. May be start a new thread etc */
    }

    return 0;
}

备注：

ps H显示所有主题。
$1打印第一列，其中PID显示在我的系统Ubuntu上。列号可能会根据系统而改变
将a.out替换为您的进程名称
反引号将评估其中的表达式，并为您提供进程的PID。我们正在利用所有POSIX线程都具有相同PID的事实。

Answer 4

我怀疑当线程因任何原因死亡或退出时，Linux会发出信号。你可以手动完成。

首先，让我们考虑两种方式结束线程：

它自行终止
它死了

在第一种方法中，线程本身可以告诉某人（比如线程管理器）它正在被终止。然后线程管理器将产生另一个线程。

在第二种方法中，监视程序线程可以跟踪线程是否处于活动状态。这或多或少都是这样做的：

Thread:
    while (do stuff)
        this_thread->is_alive = true
        work

Watchdog:
    for all threads t
        t->timeout = 0
    while (true)
        for all threads t
            if t->is_alive
                t->timeout = 0
                t->is_alive = false
            else
                ++t->timeout
                if t->timeout > THRESHOLD
                    Thread has died! Tell the thread manager to respawn it

Answer 5

如果出于任何原因，人们不能选择Ed Heal的“正常工作” - 接近（这是我对OP的问题最喜欢的答案，顺便说一句），懒惰的狐狸可能会看看pthread_cleanup_push()和pthread_cleanup_pop()宏，并考虑在这两个宏之间包含整个线程函数的主体。

Answer 6

知道线程是否完成的干净方法是针对该线程调用pthread_join()。

// int pthread_join(pthread_t thread, void **retval);
int retval = 0;
int r = pthread_join(that_thread_id, &retval);
... here you know that_thread_id returned ...

pthread_join()的问题是，如果线程永远不会返回（继续按预期运行），那么您将被阻止。因此，在您的情况下，这不是很有用。

但是，您实际上可以检查是否可以加入（tryjoin），如下所示：

//int pthread_tryjoin_np(pthread_t thread, void **retval);
int retval = 0;
int r = pthread_tryjoin_np(that_thread_id, &relval);

// here 'r' tells you whether the thread returned (joined) or not.
if(r == 0)
{
   // that_thread_id is done, create new thread here
   ...
}
else if(errno != EBUSY)
{
   // react to "weird" errors... (maybe a perror() at least?)
}
// else -- thread is still running

还有一个定时连接会等待你指定的时间，比如几秒钟。根据要检查的线程数量以及主进程是否只是位于其他位置，它可能是一个解决方案。在线程1上阻塞5秒，然后在线程2中阻塞5秒等等，对于1,000个线程，每个循环将花费5,000秒（大约85分钟绕过所有线程以及管理事物所需的时间......）

手册页中有一个示例代码，显示了如何使用pthread_timedjoin_np（）函数。所有你需要做的就是用一个for循环来检查你的每个线程。

struct timespec ts;
int s;

...

if (clock_gettime(CLOCK_REALTIME, &ts) == -1) {
 /* Handle error */
}

ts.tv_sec += 5;

s = pthread_timedjoin_np(thread, NULL, &ts);
if (s != 0) {
   /* Handle error */
}

如果您的主要流程还有其他事情要做，我建议您不要使用定时版本，只要尽可能快地浏览所有主题。

如何在线程因某些错误而被终止时收到通知

6 个答案: