我正在使用C开发共享内存IPC。
在设计中,主服务器线程侦听队列(称为主请求队列),该队列位于共享内存中并用于新的连接请求。当每个客户端进程想要连接到服务器时,它首先尝试锁定该队列上的互斥锁,然后放入请求然后解锁互斥锁。
当我并行运行多个客户端时,说100个客户端,一个或两个客户端,在尝试获取锁定之前,大部分时间都无法获得互斥并等待60秒。所有其他99或98客户端,只要它是空闲的,就可以访问互斥锁并继续执行。互斥锁不会超过1毫秒,因为只有简单的memcpy正在发生...所以其他问题就出现了,我怀疑它是pthread实现中的一个错误(最有可能它赢了&# 39;但是,但仍然:))。
当客户端进程等待互斥锁
时,这是GDB中的回溯#0 0x00007fa621bb7f1c in __lll_lock_wait () from /lib/x86_64-linux-gnu/libpthread.so.0
#1 0x00007fa621bb3649 in _L_lock_909 () from /lib/x86_64-linux-gnu/libpthread.so.0
#2 0x00007fa621bb3470 in pthread_mutex_lock () from /lib/x86_64-linux-gnu/libpthread.so.0
#3 0x00007fa6207d4bdf in get_connection () at src/shm_ipc/ipc_req.c:24
#4 0x00007fa6207d4dfa in RegisterConnection () at src/shm_ipc/ipc_req.c:87
#5 0x00007fa6207db161 in IPC_Init () at src/shm_ipc/shm_rocksdb.c:10
#6 0x00007fa6207c66bb in transitive_closure_pali (fcinfo=0x2d86158) at src/GraphIndex_Functions.c:50
#7 0x00000000006c00b4 in ExecInterpExpr (state=0x2d86068, econtext=0x2d86678, isnull=0x7fff221d688c "") at execExprInterp.c:672
#8 0x00000000007bd51d in ExecEvalExprSwitchContext (state=0x2d86068, econtext=0x2d86678, isNull=0x7fff221d688c "") at ../../../../src/include/executor/executor.h:308
#9 0x00000000007c44d8 in evaluate_expr (expr=0x2d81ee8, result_type=25, result_typmod=-1, result_collation=100) at clauses.c:4735
#10 0x00000000007c3906 in evaluate_function (funcid=287915, result_type=25, result_typmod=-1, result_collid=100, input_collid=0, args=0x2b7be38, funcvariadic=0 '\000', func_tuple=0x7fa6092c8278,
context=0x7fff221d7af0) at clauses.c:4292
#11 0x00000000007c2e31 in simplify_function (funcid=287915, result_type=25, result_typmod=-1, result_collid=100, input_collid=0, args_p=0x7fff221d6a10, funcvariadic=0 '\000', process_args=1 '\001',
allow_non_const=1 '\001', context=0x7fff221d7af0) at clauses.c:3932
#12 0x00000000007c08e7 in eval_const_expressions_mutator (node=0x2b7b9b8, context=0x7fff221d7af0) at clauses.c:2591
#13 0x000000000072d236 in expression_tree_mutator (node=0x2b7ba10, mutator=0x7c0401 <eval_const_expressions_mutator>, context=0x7fff221d7af0) at nodeFuncs.c:2854
#14 0x00000000007c2866 in eval_const_expressions_mutator (node=0x2b7ba10, context=0x7fff221d7af0) at clauses.c:3582
#15 0x000000000072d428 in expression_tree_mutator (node=0x2b7ba68, mutator=0x7c0401 <eval_const_expressions_mutator>, context=0x7fff221d7af0) at nodeFuncs.c:2903
#16 0x00000000007c2866 in eval_const_expressions_mutator (node=0x2b7ba68, context=0x7fff221d7af0) at clauses.c:3582
#17 0x00000000007c03ae in eval_const_expressions (root=0x2b7bba0, node=0x2b7ba68) at clauses.c:2433
#18 0x00000000007a1f70 in preprocess_expression (root=0x2b7bba0, expr=0x2b7ba68, kind=1) at planner.c:915
#19 0x00000000007a1780 in subquery_planner (glob=0x2b7b6a0, parse=0x2b7b110, parent_root=0x0, hasRecursion=0 '\000', tuple_fraction=0) at planner.c:628
#20 0x00000000007a0dcd in standard_planner (parse=0x2b7b110, cursorOptions=256, boundParams=0x0) at planner.c:334
#21 0x00000000007a0b4a in planner (parse=0x2b7b110, cursorOptions=256, boundParams=0x0) at planner.c:210
#22 0x0000000000889a03 in pg_plan_query (querytree=0x2b7b110, cursorOptions=256, boundParams=0x0) at postgres.c:796
#23 0x0000000000889b30 in pg_plan_queries (querytrees=0x2b7bb68, cursorOptions=256, boundParams=0x0) at postgres.c:862
#24 0x0000000000889e5a in exec_simple_query (query_string=0x2b79f70 "SELECT transitive_closure_pali('public.friendship','cd70a1b1-c203-42d7-ac3c-51b165fc3285','friends__id', 5);") at postgres.c:1027
#25 0x000000000088e431 in PostgresMain (argc=1, argv=0x2ce6638, dbname=0x2ce6530 "postgres", username=0x2ce6508 "insaf-5680") at postgres.c:4090
#26 0x00000000007ef252 in BackendRun (port=0x2cded70) at postmaster.c:4357
#27 0x00000000007ee97c in BackendStartup (port=0x2cded70) at postmaster.c:4029
#28 0x00000000007eaf42 in ServerLoop () at postmaster.c:1753
#29 0x00000000007ea52e in PostmasterMain (argc=3, argv=0x2b5a6e0) at postmaster.c:1361
#30 0x0000000000727e0c in main (argc=3, argv=0x2b5a6e0) at main.c:228
我正在创建一个postgres扩展,这就是堆栈跟踪包含其他功能的原因。
get_connection()正在调用pthread_mutex_lock(),函数.. get_connection()的源代码如下:
static int get_connection(void)
{
int queue_id;
while(1)
{
pthread_mutex_lock(&(IPCComm->backend_queue_lock));
for (queue_id = 0; queue_id < MaxConnections; queue_id++)
{
if (IPCComm->backend_queue[queue_id].free)
{
IPCComm->backend_queue[queue_id].free = false;
break;
}
}
pthread_mutex_unlock(&(IPCComm->backend_queue_lock));
//If free connection has been got, break out of the loop, else sleep for some time and try again
if(queue_id != MaxConnections)
break;
else
{
usleep(1000 * 10);
}
}
return queue_id;
}
我尝试过的...... 如果我以某种方式向进程发出信号,比如通过gdb附加到它然后继续,它会快速获取互斥并继续执行。所以它有点睡眠60秒,我试过浏览glibc源代码但是找不到那样的代码。
我的系统详情:
$ uname -vr
4.2.0-36-generic #42~14.04.1-Ubuntu SMP Fri May 13 17:27:22 UTC 2016
$ ldd --version
ldd (Ubuntu EGLIBC 2.19-0ubuntu6.13) 2.19
$ gcc --version
gcc (Ubuntu 4.8.5-2ubuntu1~14.04.1) 4.8.5
请帮忙。
答案 0 :(得分:1)
问题在于,我使用DEFAULT_INITIALIZER初始化了互斥锁 使用这个
pthread_mutex_init(&mutex, NULL);
我应该为互斥锁设置PTHREAD_PROCESS_SHARED属性。将其设置为PTHREAD_PROCESS_SHARED后,它工作正常。
代码...
int rc;
pthread_mutexattr_t mattr;
pthread_mutex_t mutex;
rc = pthread_mutexattr_init(&mattr);
if(rc != 0)
perror("Error occured in mutex attr init");
rc = pthread_mutexattr_setpshared(&mattr, PTHREAD_PROCESS_SHARED);
if(rc != 0)
perror("Error occured in pthread_mutexattr_setpshared");
pthread_mutex_init(&mutex, &mattr);