Question

我正在使用pthreads尝试并行化Dijkstra的寻路算法，但我遇到了一个我似乎无法弄清楚的死锁情况。它的要点是每个线程都有自己的优先级队列（std :: multiset）和对应于该队列的互斥锁，无论何时需要修改它都会被锁定。

每个节点都有一个所有者线程，它对应于节点ID模数线程计数。如果一个线程正在查看节点的邻居并将其中一个权重（标签）更新为低于之前的值，则会锁定其所有者的队列并删除/重新插入（这是为了强制该集更新其在队列中的位置）。但是，这种实现似乎陷入僵局。我不知道为什么，因为据我所知，每个线程一次只能拥有一个锁。

每个线程的初始队列包含其所有节点，但除源之外的每个节点的权重都初始化为ULONG_MAX。如果一个线程失去工作（它从队列中获得ULONG_MAX权重的节点），它只是保持锁定和解锁，直到另一个线程让它工作。

void *Dijkstra_local_owner_worker(void *param){
  struct thread_args *myargs = ((struct thread_args *)param);

  int tid = myargs->tid;
  std::multiset<Node *,cmp_label> *Q = (myargs->Q);

  struct thread_args *allargs = ((struct thread_args *)param)-tid;

  AdjGraph *G = (AdjGraph *)allargs[thread_count].Q;
  struct Node *n, *p;
  int owner;
  std::set<Edge>::iterator it;
  Edge e;

  pthread_mutex_lock(&myargs->mutex);
  while(!Q->empty()){
    n = *Q->begin(); Q->erase(Q->begin());
    pthread_mutex_unlock(&myargs->mutex);

    if(n->label == ULONG_MAX){
      pthread_mutex_lock(&myargs->mutex);
      Q->insert(n);
      continue;
    }
    for( it = n->edges->begin(); it != n->edges->end(); it++){
      e = *it;
      p = G->getNode(e.dst);
      owner = (int)(p->index % thread_count);
      if(p->label > n->label + e.weight){
        pthread_mutex_lock(&(allargs[owner].mutex));
        allargs[owner].Q->erase(p);
        p->label = n->label + e.weight;
        p->prev = n;
        allargs[owner].Q->insert(p);//update p's position in the PQ
        pthread_mutex_unlock(&(allargs[owner].mutex));
      }
    }
    pthread_mutex_lock(&myargs->mutex);
  }
  pthread_mutex_unlock(&myargs->mutex);
  return NULL;
}

这是产生线程的函数。

bool Dijkstra_local_owner(AdjGraph *G, struct Node *src){
  G->setAllNodeLabels(ULONG_MAX);
  struct thread_args args[thread_count+1];
  src->label = 0;
  struct Node *n;
  for(int i=0; i<thread_count; i++){
    args[i].Q = new std::multiset<Node *,cmp_label>;
    args[i].tid = i;
    pthread_mutex_init(&args[i].mutex,NULL);
  }

  for(unsigned long i = 0; i < G->n; i++){
    n = G->getNode(i); //give all threads their workload in advance
    args[(n->index)%thread_count].Q->insert(n);
  }
  args[thread_count].Q = (std::multiset<Node *,cmp_label> *)G;
  //hacky repackaging of a pointer to prevent use of globals
  //please note this works and is not the issue. I know it's horrible.

  pthread_t threads[thread_count];
  for(int i=0; i< thread_count; i++){
    pthread_create(&threads[i],NULL,Dijkstra_local_owner_worker,&args[i]);
  }

  for(int i=0; i< thread_count; i++){
    pthread_join(threads[i],NULL);
  }

  for(int i=0; i< thread_count; i++){
    delete args[i].Q;
  }
}

每个线程参数的结构定义：

struct thread_args{
  std::multiset<Node *,cmp_label> *Q; 
  pthread_mutex_t mutex; 
  int tid;
};

我的问题是，这段代码在哪里死锁？我在这里得到隧道视野，所以我看不出我出错的地方。我确保所有其他逻辑都有效，所以像指针解引用等都是正确的。

Answer 1

您的代码如下所示：

lock()
While(cond)
{
   unlock()
   if (cond1)
   { 
      lock()
   }

  for(...)
  {
   ....
  }

  lock()  
}

unlock()

我认为根据数据路径很容易看出这种方法有问题。

我只会将锁用于关键操作：

lock()
Q->erase(..)
unlock()

OR

lock()
Q->insert(..)
unlock()

尝试简化事情，看看是否有帮助

Answer 2

如果一个线程失去工作（它正在获得具有ULONG_MAX权重的节点从队列中）它只是保持锁定和解锁直到另一个线程让它工作。

这是一个潜在的问题 - 一旦线程进入此状态，它将基本上在其时间片的整个持续时间内锁定互斥锁。 pthreads互斥体是轻量级的，这意味着它们不能保证公平 - 很可能（甚至可能），忙等待线程能够在唤醒等待线程能够获取它之前重新获取锁。

您应该在这里使用pthread_cond_wait()，并在另一个线程更新队列时发出条件变量信号。循环的开始看起来像是：

  pthread_mutex_lock(&myargs->mutex);

  while (!Q->empty())
  {
    n = *Q->begin();
    if (n->label == ULONG_MAX)
    {
        pthread_cond_wait(&myargs->cond, &myargs->mutex);
        continue;  /* Re-check the condition after `pthread_cond_wait()` returns */
    }

    Q->erase(Q->begin());
    pthread_mutex_unlock(&myargs->mutex);

    /* ... */

并且更新另一个节点的队列的位置如下所示：

    /* ... */
    allargs[owner].Q->insert(p); //update p's position in the PQ
    pthread_cond_signal(&allargs[owner].cond);
    pthread_mutex_unlock(&allargs[owner].mutex);

并行Dijkstra死锁

2 个答案: