我使用“ valgrind --tool = drd…”在C语言中调试了基于POSIX pthread的程序。Valgrind检测到错误,并使用以下错误消息终止了线程:
drd: drd_vc.c:96 (vgDrd_vc_increment): Assertion 'oldcount < vc->vc[i].count' failed.
host stacktrace:
==27993== at 0x38025C68: ??? (in /usr/lib64/valgrind/drd-amd64-linux)
==27993== by 0x38025D94: ??? (in /usr/lib64/valgrind/drd-amd64-linux)
==27993== by 0x38025F21: ??? (in /usr/lib64/valgrind/drd-amd64-linux)
==27993== by 0x38018317: ??? (in /usr/lib64/valgrind/drd-amd64-linux)
==27993== by 0x380183EC: ??? (in /usr/lib64/valgrind/drd-amd64-linux)
==27993== by 0x380187FC: ??? (in /usr/lib64/valgrind/drd-amd64-linux)
==27993== by 0x3801D87E: ??? (in /usr/lib64/valgrind/drd-amd64-linux)
==27993== by 0x3800967A: ??? (in /usr/lib64/valgrind/drd-amd64-linux)
==27993== by 0x3803DB80: ??? (in /usr/lib64/valgrind/drd-amd64-linux)
==27993== by 0x38078BDF: ??? (in /usr/lib64/valgrind/drd-amd64-linux)
==27993== by 0x3808742A: ??? (in /usr/lib64/valgrind/drd-amd64-linux)
sched status:
running_tid=1
Thread 1: status = VgTs_Runnable (lwpid 27993)
==27993== at 0x4C339D3: pthread_mutex_unlock (in /usr/lib64/valgrind/vgpreload_drd-amd64-linux.so)
==27993== by 0x47F671: lf_pthread_mutex_unlock (htab2.c:192)
==27993== by 0x405194: prepare_to_read_n_go (ep3.c:805)
==27993== by 0x4053C4: reading_begin (ep3.c:847)
==27993== by 0x405CFF: start_file_loader (ep3.c:1062)
==27993== by 0x405D4E: start_services (ep3.c:1076)
==27993== by 0x406743: init_procs (ep3.c:1295)
==27993== by 0x40336B: main (ep.c:89)
在程序源代码中,“ lf_pthread_mutex_unlock”中的系统调用“ pthread_mutex_unlock”是终止之前执行的最后一条语句。我正在使用包装有``Linux linux 4.4.76-1-default#1 SMP Fri Jul 14 08:48:13 UTC 2017(9a2885c)x86_64 x86_64 x86_64 GNU / Linux''的OpenSuse。编译器为gcc-4.8。
我在drd_vc.c中找到了源代码:
/** Increment the clock of thread 'tid' in vector clock 'vc'. */
void DRD_(vc_increment)(VectorClock* const vc, DrdThreadId const tid)
{
unsigned i;
for (i = 0; i < vc->size; i++)
{
if (vc->vc[i].threadid == tid)
{
typeof(vc->vc[i].count) const oldcount = vc->vc[i].count;
vc->vc[i].count++;
// Check for integer overflow.
tl_assert(oldcount < vc->vc[i].count);
return;
}
}
…
}
其中的语句“ tl_assert(oldcount
我不明白线程计数器“ count”的含义是什么,以及它如何在程序中溢出。我的程序正在运行大量数据(数百万条记录)。我用不同数量的变量更改了程序,并注意到错误始终发生。因此,我猜该错误不是由内存覆盖引起的。
有关源代码drd_vc.c的性质的任何信息将不胜感激。谢谢。