我正在努力弄清楚我的同步原因。使用pthread库时,代码会死锁。使用winapi原语代替pthread工作没有问题。使用c ++ 11线程也可以正常工作(除非使用visual studio 2012 Service Pack 3编译,它只是崩溃 - 微软接受它作为一个bug。)然而使用pthread证明是一个问题 - 至少在linux机器上运行,没有机会尝试不同的操作系统。
我写了一个简单的程序来说明问题。代码只显示了死锁 - 我很清楚设计非常糟糕,可以写得更好。
typedef struct _pthread_event
{
pthread_mutex_t Mutex;
pthread_cond_t Condition;
unsigned char State;
} pthread_event;
void pthread_event_create( pthread_event * ev , unsigned char init_state )
{
pthread_mutex_init( &ev->Mutex , 0 );
pthread_cond_init( &ev->Condition , 0 );
ev->State = init_state;
}
void pthread_event_destroy( pthread_event * ev )
{
pthread_cond_destroy( &ev->Condition );
pthread_mutex_destroy( &ev->Mutex );
}
void pthread_event_set( pthread_event * ev , unsigned char state )
{
pthread_mutex_lock( &ev->Mutex );
ev->State = state;
pthread_mutex_unlock( &ev->Mutex );
pthread_cond_broadcast( &ev->Condition );
}
unsigned char pthread_event_get( pthread_event * ev )
{
unsigned char result;
pthread_mutex_lock( &ev->Mutex );
result = ev->State;
pthread_mutex_unlock( &ev->Mutex );
return result;
}
unsigned char pthread_event_wait( pthread_event * ev , unsigned char state , unsigned int timeout_ms )
{
struct timeval time_now;
struct timespec timeout_time;
unsigned char result;
gettimeofday( &time_now , NULL );
timeout_time.tv_sec = time_now.tv_sec + ( timeout_ms / 1000 );
timeout_time.tv_nsec = time_now.tv_usec * 1000 + ( ( timeout_ms % 1000 ) * 1000000 );
pthread_mutex_lock( &ev->Mutex );
while ( ev->State != state )
if ( ETIMEDOUT == pthread_cond_timedwait( &ev->Condition , &ev->Mutex , &timeout_time ) ) break;
result = ev->State;
pthread_mutex_unlock( &ev->Mutex );
return result;
}
static pthread_t thread_1;
static pthread_t thread_2;
static pthread_event data_ready;
static pthread_event data_needed;
void * thread_fx1( void * c )
{
for ( ; ; )
{
pthread_event_wait( &data_needed , 1 , 90 );
pthread_event_set( &data_needed , 0 );
usleep( 100000 );
pthread_event_set( &data_ready , 1 );
printf( "t1: tick\n" );
}
}
void * thread_fx2( void * c )
{
for ( ; ; )
{
pthread_event_wait( &data_ready , 1 , 50 );
pthread_event_set( &data_ready , 0 );
pthread_event_set( &data_needed , 1 );
usleep( 100000 );
printf( "t2: tick\n" );
}
}
int main( int argc , char * argv[] )
{
pthread_event_create( &data_ready , 0 );
pthread_event_create( &data_needed , 0 );
pthread_create( &thread_1 , NULL , thread_fx1 , 0 );
pthread_create( &thread_2 , NULL , thread_fx2 , 0 );
pthread_join( thread_1 , NULL );
pthread_join( thread_2 , NULL );
pthread_event_destroy( &data_ready );
pthread_event_destroy( &data_needed );
return 0;
}
基本上两个线程相互发信号 - 开始做某事,即使在短暂超时后没有发出信号,也要做自己的事情。
知道那里出了什么问题吗?
感谢。
答案 0 :(得分:1)
问题是timeout_time
的{{1}}参数。你增加它的方式最终很快就会有一个无效值,纳秒部分大于或等于十亿。在这种情况下,pthread_cond_timedwait()
可能会返回pthread_cond_timedwait()
,并且可能实际上在等待条件之前。
问题可以很快找到 EINVAL
(很快就说它已经检测到10000000错误并放弃了计数):
valgrind --tool=helgrind ./test_prog
还有另外两个小评论:
bash$ gcc -Werror -Wall -g test.c -o test -lpthread && valgrind --tool=helgrind ./test
==3035== Helgrind, a thread error detector
==3035== Copyright (C) 2007-2012, and GNU GPL'd, by OpenWorks LLP et al.
==3035== Using Valgrind-3.8.1 and LibVEX; rerun with -h for copyright info
==3035== Command: ./test
==3035==
t1: tick
t2: tick
t2: tick
t1: tick
t2: tick
t1: tick
t1: tick
t2: tick
t1: tick
t2: tick
t1: tick
==3035== ---Thread-Announcement------------------------------------------
==3035==
==3035== Thread #2 was created
==3035== at 0x41843C8: clone (clone.S:110)
==3035==
==3035== ----------------------------------------------------------------
==3035==
==3035== Thread #2's call to pthread_cond_timedwait failed
==3035== with error code 22 (EINVAL: Invalid argument)
==3035== at 0x402DB03: pthread_cond_timedwait_WRK (hg_intercepts.c:784)
==3035== by 0x8048910: pthread_event_wait (test.c:65)
==3035== by 0x8048965: thread_fx1 (test.c:80)
==3035== by 0x402E437: mythread_wrapper (hg_intercepts.c:219)
==3035== by 0x407DD77: start_thread (pthread_create.c:311)
==3035== by 0x41843DD: clone (clone.S:131)
==3035==
t2: tick
==3035==
==3035== More than 10000000 total errors detected. I'm not reporting any more.
==3035== Final error counts will be inaccurate. Go fix your program!
==3035== Rerun with --error-limit=no to disable this cutoff. Note
==3035== that errors may occur in your program without prior warning from
==3035== Valgrind, because errors are no longer being displayed.
==3035==
^C==3035==
==3035== For counts of detected and suppressed errors, rerun with: -v
==3035== Use --history-level=approx or =none to gain increased speed, at
==3035== the cost of reduced accuracy of conflicting-access information
==3035== ERROR SUMMARY: 10000000 errors from 1 contexts (suppressed: 412 from 109)
Killed
中你可以在互斥锁解锁之前完成条件变量广播(错误排序的效果基本上可以打破调度的确定性; pthread_event_set()
抱怨这个问题太); helgrind
的值 - 这应该是原子操作。