Question

我有问题。我需要实现一个使用计时器和SIGALRM切换ucontext线程的程序，但是当我使用evict_thread函数切换线程时，我遇到了分段错误。我相信它是竞争条件的结果，因为它发生在程序执行期间的不同时间。这是我的evict_thread

void evict_thread(int signal)
{   
// Check that there is more than one thread in the queue
if ((int)list_length(runqueue) > 1)
{
    // Remove the currently executing thread from the runqueue and store its id
    int evict_thread_id = list_shift_int(runqueue);

    // Place the thread at the back of the run queue
    list_append_int(runqueue, evict_thread_id);

    // Get the id of the thread that is now at the head of the run queue
    int exec_thread_id = list_item_int(runqueue, 0);

    // Set the start time for new thread to the current time
    clock_gettime(CLOCK_REALTIME, &thread_table[exec_thread_id]->start);

    printf("Switching context from %s to %s\n",
        thread_table[evict_thread_id]->thread_name,
        thread_table[exec_thread_id]->thread_name);

    // Execute the thread at the head of the run queue
    if (swapcontext(&thread_table[evict_thread_id]->context, &thread_table[exec_thread_id]->context) == -1)
    {
        perror("swapcontext failed\n");
        printf("errno: %d.\n", errno);
        return;
    }   
}
return;     
}

以下方式调用上述函数

// Set the SIGALRM
if (sigset(SIGALRM, evict_thread) == -1)
{
    perror("sigset failed\n");
    printf("errno: %d.\n", errno);
    return;
}

// Initialize timer
thread_switcher.it_interval.tv_sec  = 0;
thread_switcher.it_interval.tv_usec = quantum_size;
thread_switcher.it_value.tv_sec = 0;
thread_switcher.it_value.tv_usec =  quantum_size;
setitimer(ITIMER_REAL, &thread_switcher, 0);

运行队列只是一个全局的整数列表，它是指向ucontext线程的全局指针表的索引。该列表使用libslack.org上提供的C通用实用程序库中的列表数据结构实现

当我禁用计时器并让每个线程在切换上下文之前运行完成时程序正常运行，但是在执行期间切换线程时，我会在80％的时间内出现分段错误。

此外，当我尝试使用gdb来回溯分段错误时，它表示它发生在系统调用中。

Answer 1

我无法就如何使其发挥作出任何建议，但这里有一些关于什么不起作用的观点：

信号处理程序与您的其他代码异步运行。例如当某些代码更新runqueue时，以及信号处理程序运行时，信号可能会启动 list_append_int(runqueue, evict_thread_id); 你有一个相当严重的竞争条件。

不应该在信号处理程序中调用

printf()，它可能会死锁或更糟。 Here's可在信号处理程序中安全调用的函数列表。没有提到setcontext / swapcontext在信号处理程序中是安全的，虽然它的linux手册页说你可以在信号处理程序中调用setcontext（） - 我不确定这是什么权威。

还要注意setcontext（）的联机帮助页是什么：

当信号出现时，保存当前用户上下文并为新信号上下文由内核为信号处理程序创建。

因此，当您发出swapcontext（）时，您可能正在保存信号处理程序的上下文，而不是在信号启动之前运行的当前上下文。

Answer 2

请记住，信号处理程序与主代码异步运行。 man 7 signal页面值得仔细阅读，以确保您遵守指南。例如，在Async-signal-safe-functions部分中，没有提及printf或其他功能，例如swapcontext。这意味着您无法从信号处理程序中可靠地调用这些函数。

通常，尝试尽可能少地处理信号处理程序。通常这只是意味着在信号处理程序中设置类型为sig_atomic_t的标志，然后在主循环中检查此标志的状态。

或许重新排列代码，以便在主循环中进行上下文切换，而不是从信号处理程序进行。您可以在主循环中使用sigwait来等待定时器信号。

Answer 3

作为猜测：你正在将内容传递给那里不可见的内核，因为你切换了上下文。您在询问段错误，但您的代码正在做有趣的事情。

也许如果您考虑使用更标准的线程调度模型，您可以避免这些问题。而不是尝试使用上下文切换来调度线程，而不是其他方法来执行此操作。您可以使用完全相同的当前程序模型从驱逐线程中调用它们。

这个建议中的一些是有点系统特定的。如果您能告诉我们您的操作系统是什么，我们可以找到适合您情况的产品。或者你可以自己检查一下。

了解POSIX线程调度。特别注意SCHED_FIFO，它将与您的模型一起使用。

https://computing.llnl.gov/tutorials/pthreads/man/sched_setscheduler.txt

这通常适用于使用POSIX线程库来安排线程，而不是你试图以艰难的方式去做。

使用SIGALRM切换线程上下文

3 个答案: