我在递归调用pthread_create()时遇到数据争用。 我不知道递归是否会导致问题, 但是比赛似乎永远不会发生在第一次迭代中,主要是在第二次迭代,很少在第三次迭代。
使用libgc时,存在内存损坏症状,例如分段错误,与数据竞争一致。
以下程序是说明问题的最小示例。 我在示例中没有使用libgc,因为只有数据竞争才是这个问题的主题。
使用Helgrind工具运行Valgrind时可以看到数据竞争。 报告的问题略有不同,有时甚至没有任何问题。
我正在运行Linux Mint 17.2。 gcc的版本是(Ubuntu 4.8.4-2ubuntu1~14.04)4.8.4。
以下示例'main.c'重现了该问题。它遍历链表,在单独的线程中打印每个元素值:
#include <stdlib.h>
#include <stdio.h>
#include <pthread.h>
typedef struct List {
int head ;
struct List* tail ;
} List ;
// create a list element with an integer head and a tail
List* new_list( int head, List* tail ) {
List* l = (List*)malloc( sizeof( List ) ) ;
l->head = head ;
l->tail = tail ;
return l ;
}
// create a thread and start it
void call( void* (*start_routine)( void* arg ), void* arg ) {
pthread_t* thread = (pthread_t*)malloc( sizeof( pthread_t ) ) ;
if ( pthread_create( thread, NULL, start_routine, arg ) ) {
exit( -1 ) ;
}
pthread_detach( *thread ) ;
return ;
}
void print_list( List* l ) ;
// start routine for thread
void* print_list_start_routine( void* arg ) {
// verify that the list is not empty ( = NULL )
// print its head
// print the rest of it in a new thread
if ( arg ) {
List* l = (List*)arg ;
printf( "%d\n", l->head ) ;
print_list( l->tail ) ;
}
return NULL ;
}
// print elements of a list with one thread for each element printed
// threads are created recursively
void print_list( List* l ) {
call( print_list_start_routine, (void*)l ) ;
}
int main( int argc, const char* argv[] ) {
List* l = new_list( 1, new_list( 2, new_list( 3, NULL ) ) ) ;
print_list( l ) ;
// wait for all threads to finnish
pthread_exit( NULL ) ;
return 0 ;
}
这是'makefile':
CC=gcc
a.out: main.o
$(CC) -pthread main.o
main.o: main.c
$(CC) -c -g -O0 -std=gnu99 -Wall main.c
clean:
rm *.o a.out
这是Helgrind最常见的输出。请注意,只有一个数字,1,2和3的行是程序的输出而不是Helgrind:
$ valgrind --tool=helgrind ./a.out
==13438== Helgrind, a thread error detector
==13438== Copyright (C) 2007-2013, and GNU GPL'd, by OpenWorks LLP et al.
==13438== Using Valgrind-3.10.0.SVN and LibVEX; rerun with -h for copyright info
==13438== Command: ./a.out
==13438==
1
2
==13438== ---Thread-Announcement------------------------------------------
==13438==
==13438== Thread #3 was created
==13438== at 0x515543E: clone (clone.S:74)
==13438== by 0x4E44199: do_clone.constprop.3 (createthread.c:75)
==13438== by 0x4E458BA: pthread_create@@GLIBC_2.2.5 (createthread.c:245)
==13438== by 0x4C30C90: ??? (in /usr/lib/valgrind/vgpreload_helgrind-amd64-linux.so)
==13438== by 0x4007EB: call (main.c:25)
==13438== by 0x400871: print_list (main.c:58)
==13438== by 0x40084D: print_list_start_routine (main.c:48)
==13438== by 0x4C30E26: ??? (in /usr/lib/valgrind/vgpreload_helgrind-amd64-linux.so)
==13438== by 0x4E45181: start_thread (pthread_create.c:312)
==13438== by 0x515547C: clone (clone.S:111)
==13438==
==13438== ---Thread-Announcement------------------------------------------
==13438==
==13438== Thread #2 was created
==13438== at 0x515543E: clone (clone.S:74)
==13438== by 0x4E44199: do_clone.constprop.3 (createthread.c:75)
==13438== by 0x4E458BA: pthread_create@@GLIBC_2.2.5 (createthread.c:245)
==13438== by 0x4C30C90: ??? (in /usr/lib/valgrind/vgpreload_helgrind-amd64-linux.so)
==13438== by 0x4007EB: call (main.c:25)
==13438== by 0x400871: print_list (main.c:58)
==13438== by 0x4008BB: main (main.c:66)
==13438==
==13438== ----------------------------------------------------------------
==13438==
==13438== Possible data race during write of size 1 at 0x602065F by thread #3
==13438== Locks held: none
==13438== at 0x4C368F5: mempcpy (in /usr/lib/valgrind/vgpreload_helgrind-amd64-linux.so)
==13438== by 0x4012CD6: _dl_allocate_tls_init (dl-tls.c:436)
==13438== by 0x4E45715: pthread_create@@GLIBC_2.2.5 (allocatestack.c:252)
==13438== by 0x4C30C90: ??? (in /usr/lib/valgrind/vgpreload_helgrind-amd64-linux.so)
==13438== by 0x4007EB: call (main.c:25)
==13438== by 0x400871: print_list (main.c:58)
==13438== by 0x40084D: print_list_start_routine (main.c:48)
==13438== by 0x4C30E26: ??? (in /usr/lib/valgrind/vgpreload_helgrind-amd64-linux.so)
==13438== by 0x4E45181: start_thread (pthread_create.c:312)
==13438== by 0x515547C: clone (clone.S:111)
==13438==
==13438== This conflicts with a previous read of size 1 by thread #2
==13438== Locks held: none
==13438== at 0x51C10B1: res_thread_freeres (in /lib/x86_64-linux-gnu/libc-2.19.so)
==13438== by 0x51C1061: __libc_thread_freeres (in /lib/x86_64-linux-gnu/libc-2.19.so)
==13438== by 0x4E45199: start_thread (pthread_create.c:329)
==13438== by 0x515547C: clone (clone.S:111)
==13438==
3
==13438==
==13438== For counts of detected and suppressed errors, rerun with: -v
==13438== Use --history-level=approx or =none to gain increased speed, at
==13438== the cost of reduced accuracy of conflicting-access information
==13438== ERROR SUMMARY: 8 errors from 1 contexts (suppressed: 56 from 48)
正如Pooja Nilangekar所提到的,用pthread_join()取代pthread_detach()会消除竞争。但是,分离线程是必需的,因此目标是干净地分离线程。换句话说,在删除竞赛时保留pthread_detach()。
线程之间似乎有一些无意的共享。 意外共享可能与此处讨论的内容有关:http://www.domaigne.com/blog/computing/joinable-and-detached-threads/ 特别是示例中的错误。
我仍然不明白到底发生了什么。
答案 0 :(得分:0)
只是一个注释(我没有评论代表),我得到了非常相似的Helgrind输出,而没有递归。我使用lambda生成线程并将其分离。
==9060== Possible data race during write of size 1 at 0x126CE63F by thread #1
==9060== Locks held: none
==9060== at 0x4C36D85: mempcpy (in /usr/lib/valgrind/vgpreload_helgrind-amd64-linux.so)
==9060== by 0x4012D66: _dl_allocate_tls_init (dl-tls.c:436)
==9060== by 0x6B04715: get_cached_stack (allocatestack.c:252)
==9060== by 0x6B04715: allocate_stack (allocatestack.c:501)
==9060== by 0x6B04715: pthread_create@@GLIBC_2.2.5 (pthread_create.c:500)
==9060== by 0x4C30E0D: ??? (in /usr/lib/valgrind/vgpreload_helgrind-amd64-linux.so)
==9060== by 0x6359D23: std::thread::_M_start_thread(std::shared_ptr<std::thread::_Impl_base>, void (*)()) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.25)
==9060== by 0x404075: thread<main()::<lambda()> > (thread:138)
==9060== by 0x404075: main (test1.cpp:162)
==9060==
==9060== This conflicts with a previous read of size 8 by thread #2
==9060== Locks held: none
==9060== at 0x6E83931: res_thread_freeres (in /lib/x86_64-linux-gnu/libc-2.19.so)
==9060== by 0x6E838E1: __libc_thread_freeres (in /lib/x86_64-linux-gnu/libc-2.19.so)
==9060== by 0x6B0419B: start_thread (pthread_create.c:329)
==9060== by 0x6E1803C: clone (clone.S:111)
==9060== Address 0x126ce63f is not stack'd, malloc'd or on a free list
但是我循环执行此操作,却只报告了一次。这表明TLS机制中可能存在某种触发警报的可能性。