Question

我在C中实现了一个工作队列模式（在python扩展中），我对性能感到失望。

我有一个粒子列表（“元素”）的模拟，我对执行时间步长所需的所有计算所花费的时间进行基准测试，并将其与所涉及的粒子数一起记录下来。我在四核超线程i7上运行代码，所以我期望性能上升（时间下降），线程数达到8，但最快的实现没有工作线程（功能很简单）执行而不是添加到队列中），并且每个工作线程的代码变得越来越慢（超过每个新线程的无线实现的时间！）我已经快速查看了我的处理器使用情况应用程序，似乎python从未真正超过130％的CPU使用率，无论有多少线程在运行。该机器具有足够的空间，整体系统使用率约为200％。

现在，我的队列实现的一部分（如下所示）是从队列中选择随机项，因为每个工作项的执行需要锁定两个元素，并且类似的元素将在彼此附近队列。因此，我希望线程选择随机索引并攻击队列的不同位以最小化互斥锁冲突。

现在，我已经读过我对rand()的初步尝试会很慢，因为我的随机数不是线程安全的（这句话有意义吗？不确定......）

我已尝试使用random()和drand48_r执行（尽管不幸的是，后者似乎在OS X上不可用）但无法使用统计信息。

也许其他人可以告诉我问题可能是什么原因？代码（工作者函数）在下面，如果您认为任何queue_add函数或构造函数也可能对查看有用，请大声说。

void* worker_thread_function(void* untyped_queue) {

  queue_t* queue = (queue_t*)untyped_queue;
  int success = 0;
  int rand_id;
  long int temp;
  work_item_t* work_to_do = NULL;
  int work_items_completed = 0;

  while (1) {
    if (pthread_mutex_lock(queue->mutex)) {

      // error case, try again:
      continue;
    }

    while (!success) {

      if (queue->queue->count == 0) {

        pthread_mutex_unlock(queue->mutex);
        break;
      }

      // choose a random item from the work queue, in order to avoid clashing element mutexes.
      rand_id = random() % queue->queue->count;

      if (!pthread_mutex_trylock(((work_item_t*)queue->queue->items[rand_id])->mutex)) {

        // obtain mutex locks on both elements for the work item.
        work_to_do = (work_item_t*)queue->queue->items[rand_id];

        if (!pthread_mutex_trylock(((element_t*)work_to_do->element_1)->mutex)){ 
          if (!pthread_mutex_trylock(((element_t*)work_to_do->element_2)->mutex)) {

            success = 1;
          } else {

            // only locked element_1 and work item:
            pthread_mutex_unlock(((element_t*)work_to_do->element_1)->mutex);
            pthread_mutex_unlock(work_to_do->mutex);
            work_to_do = NULL;
          }
        } else {

          // couldn't lock element_1, didn't even try 2:
          pthread_mutex_unlock(work_to_do->mutex);
          work_to_do = NULL;
        }
      }
    }

    if (work_to_do == NULL) {
       if (queue->queue->count == 0 && queue->exit_flag) {

        break;
      } else {

        continue;
      }
    }

    queue_remove_work_item(queue, rand_id, NULL, 1);
    pthread_mutex_unlock(work_to_do->mutex);

    pthread_mutex_unlock(queue->mutex);

    // At this point, we have mutex locks for the two elements in question, and a
    // work item no longer visible to any other threads. we have also unlocked the main
    // shared queue, and are free to perform the work on the elements.
    execute_function(
      work_to_do->interaction_function,
      (element_t*)work_to_do->element_1,
      (element_t*)work_to_do->element_2,
      (simulation_parameters_t*)work_to_do->params
    );

    // now finished, we should unlock both the elements:
    pthread_mutex_unlock(((element_t*)work_to_do->element_1)->mutex);
    pthread_mutex_unlock(((element_t*)work_to_do->element_2)->mutex);

    // and release the work_item RAM:
    work_item_destroy((void*)work_to_do);
    work_to_do = NULL;

    work_items_completed++;
    success = 0;
  }
  return NULL;
}

Answer 1

Python线程不是真正的线程。所有python线程都在相同的OS级别线程中运行，并且由于GIL（全局解释器锁定）一次一个地执行。如果工作人员在上下文中相对长寿，那么使用流程重写代码可能会起到作用。

Wikipedia's page on GIL

---- ----编辑

是的，这是在c。但GIL仍然很重要。 Info on threads in c extensions

Answer 2

似乎random（）似乎不是你的问题，因为无论线程数是多少都是相同的代码。由于性能因线程数而下降，因此可能会因锁定开销而被杀死。你真的需要多个线程吗？工作函数需要多长时间，您的平均队列深度是多少？随机选择项目似乎是一个坏主意。当然，如果队列计数<＆lt; = 2，则不需要进行rand计算。此外，不是随机选择队列索引，最好只为每个工作线程使用不同的队列并以循环方式插入。或者，至少有一些简单的事情，比如记住前一个主题所声称的最后一个索引，而不是选择那个。

Answer 3

要知道这是否是您计划的瓶颈，您必须进行基准测试和检查，但这很可能。

random()和拥有隐藏状态变量的朋友可能是并行编程的严重瓶颈。如果它们是线程安全的，这通常只是通过静音访问来完成，所以一切都变慢了。

POSIX系统上线程安全随机生成器的可移植选择是erand48。与drand48相比，它接收状态变量作为参数。你只需要在每个线程的堆栈上保留一个状态变量（它是unsigned short[3]）并用它调用erand48。

另请注意，这些是伪随机生成器。如果在不同的线程之间使用相同的状态变量，则随机数不是独立的。

python c扩展：多线程和随机数

3 个答案: