Question

我的程序中包含以下代码。

Thread* t = arg->thread;
//at this point, the new thread is being executed.
t->myId = TGetId();
void* (*functor)(void*) = t->functor;
void* fArg = arg->arg;
nfree(arg);
_INFO_PRINTF(1, "Launching thread with ID: %d", t->myId);
sigset_t mask;
sigfillset(&mask);         //fill mask with all signals
sigdelset(&mask, SIGUSR1); // allow SIGUSR1 to get to the thread.
sigdelset(&mask, SIGUSR2); // allow SIGUSR2 to get to the thread.
pthread_sigmask(SIG_SETMASK, &mask, NULL); //block some sigs

struct sigaction act;
memset(&act, 0, sizeof(act));
act.sa_handler = TSignalHandler;
act.sa_mask = mask;
if(sigaction(SIGUSR1, &act, NULL))
{
    _ERROR_PRINT(1, "Could not set signal action.");
    return NULL;
}
if(sigaction(SIGUSR2, &act, NULL))
{
    _ERROR_PRINT(1, "Could not set signal action.");
    return NULL;
}
void* ret = functor(fArg);
t->hasReturned = true;
return ret;

在本地linux上，执行此代码的线程将正确调用信号处理程序。问题在于，在Linux的Windows子系统上，程序挂起了SIGUSR1或SIGUSR2是通过pthread_kill发送的，该线程将信号发送到线程。为什么这在本地ubuntu（通过VMWARE WORKSTATION 14）以及debian和fedora上有效，但在WSL上无效？

Answer 1

当您有一个挂起的bug在调试器中运行时无法重现时，可以在重现挂起之后将调试器附加到正在运行的进程中。这不会让您观察导致挂起的变量的变化，但是至少您可以获得挂起发生位置的堆栈跟踪。

一旦您知道挂起进程的进程ID（假设它是12345），就可以使用：

$ gdb -p 12345

或者，您可以使用将导致生成内核的信号终止进程。我喜欢使用SIGTRAP，因为它很容易与SIGSEGV区别开来。

$ kill -SIGTRAP 12345

然后您可以使用gdb来发现进程挂了什么。

附加到正在运行的进程的优点是该进程仍处于活动状态。这使您可以从调试器中调用函数，从而可以更轻松地访问程序中内置的诊断程序。核心文件保留了该错误，如果挂起的bug难以复制，则这是有益的。

为什么该程序在WSL中挂起？

1 个答案: