如何追踪资源死锁?

时间:2013-08-05 13:12:29

标签: c++ multithreading c++11

我用std :: thread编写了一个计时器 - 这是它的样子:

TestbedTimer::TestbedTimer(char type, void* contextObject) : 
  Timer(type, contextObject) {
    this->active = false;
}

TestbedTimer::~TestbedTimer(){
    if (this->active) {
        this->active = false;

        if(this->timer->joinable()){

            try {
                this->timer->join();
            } catch (const std::system_error& e) {
                std::cout << "Caught system_error with code " << e.code()  << 
                              " meaning " << e.what() << '\n';
            }
        }

       if(timer != nullptr) {
           delete timer;
       }
    }
}

void TestbedTimer::run(unsigned long timeoutInMicroSeconds){
    this->active = true;
    timer = new std::thread(&TestbedTimer::sleep, this, timeoutInMicroSeconds);
}

void TestbedTimer::sleep(unsigned long timeoutInMicroSeconds){
    unsigned long interval = 500000;

    if(timeoutInMicroSeconds < interval){
        interval = timeoutInMicroSeconds;
    }

    while((timeoutInMicroSeconds > 0) && (active == true)){
        if (active) {
            timeoutInMicroSeconds -= interval;
            /// set the sleep time
            std::chrono::microseconds duration(interval);
            /// set thread to sleep
            std::this_thread::sleep_for(duration);
        }
    }

    if (active) {
       this->notifyAllListeners();
    } 
}

void TestbedTimer::interrupt(){
    this->active = false;
}

我对这种实现方式并不满意,因为我让计时器休眠了一小段时间并检查活动标志是否已经改变(但我不知道更好的解决方案,因为你不能打断sleep_for呼叫)。但是,我的程序核心转储有以下消息:

thread is joinable
Caught system_error with code generic:35 meaning Resource deadlock avoided
thread has rejoined main scope
terminate called without an active exception
Aborted (core dumped)

我查了一下这个错误,似乎我有一个等待另一个线程的线程(资源死锁的原因)。但是,我想知道究竟发生了什么。我在我的C ++代码中使用了一个C库(使用pthread),它提供了一个作为守护进程运行的选项,我担心这会干扰我的std :: thread代码。调试它的最佳方法是什么?

我尝试过使用helgrind,但这并没有太大帮助(它没有发现任何错误)。

TIA

**编辑:上面的代码实际上不是示例代码,但是我编写的代码是为路由守护程序编写的。路由算法是一种反应意味着它只有在没有到达所需目的地的路由并且不尝试为其网络中的每个主机建立路由表时才开始路由发现。每次触发路由发现时,都会启动计时器。如果计时器到期,则会通知守护程序并丢弃数据包。基本上,它看起来像:

void Client::startNewRouteDiscovery(Packet* packet) {
    AddressPtr destination = packet->getDestination();
    ...
    startRouteDiscoveryTimer(packet);
    ...
}

void Client::startRouteDiscoveryTimer(const Packet* packet) {
    RouteDiscoveryInfo* discoveryInfo = new RouteDiscoveryInfo(packet);
    /// create a new timer of a certain type
    Timer* timer = getNewTimer(TimerType::ROUTE_DISCOVERY_TIMER, discoveryInfo);
    /// pass that class as callback object which is notified if the timer expires (class implements a interface for that)
    timer->addTimeoutListener(this);
    /// start the timer
    timer->run(routeDiscoveryTimeoutInMilliSeconds * 1000);

    AddressPtr destination = packet->getDestination();
    runningRouteDiscoveries[destination] = timer;
}

如果计时器已过期,则调用以下方法。

void Client::timerHasExpired(Timer* responsibleTimer) {
    char timerType = responsibleTimer->getType();
    switch (timerType) {
        ...
        case TimerType::ROUTE_DISCOVERY_TIMER:
            handleExpiredRouteDiscoveryTimer(responsibleTimer);
            return;
        ....
        default:
            // if this happens its a bug in our code
            logError("Could not identify expired timer");
            delete responsibleTimer;
    }
}

我希望这有助于更好地了解我正在做的事情。但是,我并没有打算用额外的代码来解决这个问题。

0 个答案:

没有答案