使用pthread_cond_t时出现pthread死锁问题

时间:2013-08-04 15:54:23

标签: pthreads deadlock

我正在努力弄清楚我的同步原因。使用pthread库时,代码会死锁。使用winapi原语代替pthread工作没有问题。使用c ++ 11线程也可以正常工作(除非使用visual studio 2012 Service Pack 3编译,它只是崩溃 - 微软接受它作为一个bug。)然而使用pthread证明是一个问题 - 至少在linux机器上运行,没有机会尝试不同的操作系统。

我写了一个简单的程序来说明问题。代码只显示了死锁 - 我很清楚设计非常糟糕,可以写得更好。

typedef struct _pthread_event
{
     pthread_mutex_t Mutex;
     pthread_cond_t Condition;
     unsigned char  State;
} pthread_event;

void pthread_event_create( pthread_event * ev , unsigned char init_state )
{ 
    pthread_mutex_init( &ev->Mutex , 0 );
    pthread_cond_init( &ev->Condition , 0 );
    ev->State = init_state;
}

void pthread_event_destroy( pthread_event * ev )
{
    pthread_cond_destroy( &ev->Condition );
    pthread_mutex_destroy( &ev->Mutex );
}

void pthread_event_set( pthread_event * ev , unsigned char state )
{
    pthread_mutex_lock( &ev->Mutex );
    ev->State = state;
    pthread_mutex_unlock( &ev->Mutex );
    pthread_cond_broadcast( &ev->Condition );
}

unsigned char pthread_event_get( pthread_event * ev )
{
    unsigned char result;
    pthread_mutex_lock( &ev->Mutex );
    result = ev->State;
    pthread_mutex_unlock( &ev->Mutex );
    return result;
}

unsigned char pthread_event_wait( pthread_event * ev , unsigned char state , unsigned int timeout_ms )
{
    struct timeval time_now;
    struct timespec timeout_time;
    unsigned char result;

    gettimeofday( &time_now , NULL );
    timeout_time.tv_sec = time_now.tv_sec           + ( timeout_ms / 1000 );
    timeout_time.tv_nsec = time_now.tv_usec * 1000  + ( ( timeout_ms % 1000 ) * 1000000 );

    pthread_mutex_lock( &ev->Mutex );
    while ( ev->State != state ) 
          if ( ETIMEDOUT == pthread_cond_timedwait( &ev->Condition , &ev->Mutex , &timeout_time ) ) break;

    result = ev->State;
    pthread_mutex_unlock( &ev->Mutex );
    return result;
}

static pthread_t        thread_1;
static pthread_t        thread_2;
static pthread_event    data_ready;
static pthread_event    data_needed;

void * thread_fx1( void * c )
{
    for ( ; ; )
    {
        pthread_event_wait( &data_needed , 1 , 90 );
        pthread_event_set( &data_needed , 0 );
        usleep( 100000 );
        pthread_event_set( &data_ready , 1 );
        printf( "t1: tick\n" );
    }
}

void * thread_fx2( void * c )
{
    for ( ; ; )
    {
        pthread_event_wait( &data_ready , 1 , 50 );
        pthread_event_set( &data_ready , 0 );
        pthread_event_set( &data_needed , 1 );
        usleep( 100000 );
        printf( "t2: tick\n" );
    }
}


int main( int argc , char * argv[] )
{
    pthread_event_create( &data_ready , 0 );
    pthread_event_create( &data_needed , 0 );

    pthread_create( &thread_1 , NULL , thread_fx1 , 0 );
    pthread_create( &thread_2 , NULL , thread_fx2 , 0 );

    pthread_join( thread_1 , NULL );
    pthread_join( thread_2 , NULL );

    pthread_event_destroy( &data_ready );
    pthread_event_destroy( &data_needed );

    return 0;
}

基本上两个线程相互发信号 - 开始做某事,即使在短暂超时后没有发出信号,也要做自己的事情。

知道那里出了什么问题吗?

感谢。

1 个答案:

答案 0 :(得分:1)

问题是timeout_time的{​​{1}}参数。你增加它的方式最终很快就会有一个无效值,纳秒部分大于或等于十亿。在这种情况下,pthread_cond_timedwait()可能会返回pthread_cond_timedwait(),并且可能实际上在等待条件之前。

问题可以很快找到 EINVAL(很快就说它已经检测到10000000错误并放弃了计数):

valgrind --tool=helgrind ./test_prog

还有另外两个小评论:

  1. 为了提高正确性,在你的bash$ gcc -Werror -Wall -g test.c -o test -lpthread && valgrind --tool=helgrind ./test ==3035== Helgrind, a thread error detector ==3035== Copyright (C) 2007-2012, and GNU GPL'd, by OpenWorks LLP et al. ==3035== Using Valgrind-3.8.1 and LibVEX; rerun with -h for copyright info ==3035== Command: ./test ==3035== t1: tick t2: tick t2: tick t1: tick t2: tick t1: tick t1: tick t2: tick t1: tick t2: tick t1: tick ==3035== ---Thread-Announcement------------------------------------------ ==3035== ==3035== Thread #2 was created ==3035== at 0x41843C8: clone (clone.S:110) ==3035== ==3035== ---------------------------------------------------------------- ==3035== ==3035== Thread #2's call to pthread_cond_timedwait failed ==3035== with error code 22 (EINVAL: Invalid argument) ==3035== at 0x402DB03: pthread_cond_timedwait_WRK (hg_intercepts.c:784) ==3035== by 0x8048910: pthread_event_wait (test.c:65) ==3035== by 0x8048965: thread_fx1 (test.c:80) ==3035== by 0x402E437: mythread_wrapper (hg_intercepts.c:219) ==3035== by 0x407DD77: start_thread (pthread_create.c:311) ==3035== by 0x41843DD: clone (clone.S:131) ==3035== t2: tick ==3035== ==3035== More than 10000000 total errors detected. I'm not reporting any more. ==3035== Final error counts will be inaccurate. Go fix your program! ==3035== Rerun with --error-limit=no to disable this cutoff. Note ==3035== that errors may occur in your program without prior warning from ==3035== Valgrind, because errors are no longer being displayed. ==3035== ^C==3035== ==3035== For counts of detected and suppressed errors, rerun with: -v ==3035== Use --history-level=approx or =none to gain increased speed, at ==3035== the cost of reduced accuracy of conflicting-access information ==3035== ERROR SUMMARY: 10000000 errors from 1 contexts (suppressed: 412 from 109) Killed 中你可以在互斥锁解锁之前完成条件变量广播(错误排序的效果基本上可以打破调度的确定性; pthread_event_set()抱怨这个问题太);
  2. 您可以安全地删除pthread_event_get()中的互斥锁定以返回helgrind的值 - 这应该是原子操作。