Question

我是一名正在尝试使用pthreads的业余程序员，看看多线程程序在多大程度上可以提高我正在进行的相当长的计算效率。计算通过std :: list＆lt;字符串＆gt;对象，弹出列表的第一个元素，并将其分配给一个用它来计算某些东西的线程。程序跟踪活动线程，并确保始终运行一定数量的活动线程。列表为空后，程序对结果数据进行排序，转储数据文件并终止。

程序的多线程版本目前不起作用。它在列表中有20或40或200左右的元素（取决于我给它的列表）和段错误。看起来segfault发生在列表的特定元素上，这意味着它们不会以任何方式出现随机。

但奇怪的是，如果我用调试符号编译并通过gdb运行程序，程序就不会出现段错误。它完美运行。当然，慢慢地，它运行并按照我期望的方式完成所有事情。

在玩了一段时间的每个人的建议后，使用（除其他外）valgrind的工具来尝试和理清正在发生的事情。我注意到下面的简化代码（没有在std库或pthread库之外的任何调用）会给helgrind带来麻烦，这可能是我的问题的根源。所以这里只是简化的代码和helgrind的抱怨。

#include <cstdlib>
#include <pthread.h>
#include <stdio.h>
#include <stdlib.h>
#include <string>
#include <list>
#include <iostream>
#include <signal.h>
#include <sys/select.h>

struct thread_detail {
 pthread_t *threadID; 
 unsigned long num;
};

pthread_mutex_t coutLock;

void *ThreadToSpawn(void *threadarg)
{
   struct thread_detail *my_data;
   my_data = (struct thread_detail *) threadarg;
   int taskid = my_data->num;

   struct timeval timeout;
   for (unsigned long i=0; i < 10; i++)
    { 
     timeout.tv_sec = 0;  timeout.tv_usec = 500000; // half-second 
     select( 0, NULL, NULL, NULL, & timeout );
     pthread_mutex_lock(&coutLock);
     std::cout << taskid << " "; std::cout.flush();
     pthread_mutex_unlock(&coutLock);
    }
   pthread_exit(NULL);
}


int main (int argc, char *argv[])
{
  unsigned long comp_DONE=0; 
  unsigned long comp_START=0;
  unsigned long ms_LAG=10000; // microsecond lag between polling of threads

  // set-up the mutexes
  pthread_mutex_init( &coutLock, NULL );

  if (argc != 3) { std::cout << "Program requires two arguments: (1) number of threads to use,"
                               " and (2) tasks to accomplish. \n"; exit(1); }
  unsigned long NUM_THREADS(atoi( argv[1] ));
  unsigned long comp_TODO(atoi(argv[2]));
  std::cout << "Program will have " << NUM_THREADS << " threads. \n";
  std::list < thread_detail > thread_table;

   while (comp_DONE != comp_TODO) // main loop to set-up and track threads
    {
     // poll stack of computations to see if any have finished, 
     // extract data and remove completed ones from stack
     std::list < thread_detail >::iterator i(thread_table.begin());
     while (i!=thread_table.end())
      {
       if (pthread_kill(*i->threadID,0)!=0) // thread is dead
        { // if there was relevant info in *i we'd extract it here
         if (pthread_join(*i->threadID, NULL)!=0) { std::cout << "Thread join error!\n"; exit(1); }
         pthread_mutex_lock(&coutLock);
         std::cout << i->num << " done. "; std::cout.flush();
         pthread_mutex_unlock(&coutLock);
         delete i->threadID;
         thread_table.erase(i++);  
         comp_DONE++;
        }
       else (i++);
      }
     // if list not full, toss another on the pile
     while ( (thread_table.size() < NUM_THREADS) && (comp_TODO > comp_START) )
      {
        pthread_t *tId( new pthread_t );
        thread_detail Y; Y.threadID=tId; Y.num=comp_START;
        thread_table.push_back(Y);
        int rc( pthread_create( tId, NULL, ThreadToSpawn, (void *)(&(thread_table.back() )) ) );
        if (rc) { printf("ERROR; return code from pthread_create() is %d\n", rc); exit(-1); }
        pthread_mutex_lock(&coutLock);
       std::cout << comp_START << " start. "; std::cout.flush();
        pthread_mutex_unlock(&coutLock);
        comp_START++;
      }

     // wait a specified amount of time
     struct timeval timeout;
     timeout.tv_sec = 0;  timeout.tv_usec = ms_LAG; 
     select( 0, NULL, NULL, NULL, & timeout );
    } // the big while loop

   pthread_exit(NULL);
}

Helgrind输出


==2849== Helgrind, a thread error detector
==2849== Copyright (C) 2007-2009, and GNU GPL'd, by OpenWorks LLP et al.
==2849== Using Valgrind-3.6.0.SVN-Debian and LibVEX; rerun with -h for copyright info
==2849== Command: ./thread2 2 6
==2849== 
Program will have 2 threads. 
==2849== Thread #2 was created
==2849==    at 0x64276BE: clone (clone.S:77)
==2849==    by 0x555E172: pthread_create@@GLIBC_2.2.5 (createthread.c:75)
==2849==    by 0x4C2D42C: pthread_create_WRK (hg_intercepts.c:230)
==2849==    by 0x4C2D4CF: pthread_create@* (hg_intercepts.c:257)
==2849==    by 0x401374: main (in /home/rybu/prog/regina/exercise/thread2)
==2849== 
==2849== Thread #1 is the program's root thread
==2849== 
==2849== Possible data race during write of size 8 at 0x7feffffe0 by thread #2
==2849==    at 0x4C2D54C: mythread_wrapper (hg_intercepts.c:200)
==2849==  This conflicts with a previous read of size 8 by thread #1
==2849==    at 0x4C2D440: pthread_create_WRK (hg_intercepts.c:235)
==2849==    by 0x4C2D4CF: pthread_create@* (hg_intercepts.c:257)
==2849==    by 0x401374: main (in /home/rybu/prog/regina/exercise/thread2)
==2849== 
 [0 start.]  [1 start.] 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1  [0 done.]  [1 done.]  [2 start.]  [3 start.] 2 3 2 3 2 3 2 3 2 3 2 3 2 3 2 3 2 3 2 3  [2 done.]  [3 done.]  [4 start.]  [5 start.] 4 5 4 5 4 5 4 5 4 5 4 5 4 5 4 5 4 5 4 5  [4 done.]  [5 done.] ==2849== 
==2849== For counts of detected and suppressed errors, rerun with: -v
==2849== Use --history-level=approx or =none to gain increased speed, at
==2849== the cost of reduced accuracy of conflicting-access information
==2849== ERROR SUMMARY: 6 errors from 1 contexts (suppressed: 675 from 37)

据推测，我正在以不正确的方式使用pthreads但是我不太清楚我做错了什么。而且，我不知道如何制作helgrind输出。早些时候helgrind抱怨说因为我没有在线程上调用pthread_join，因为其他原因代码知道它已经死了。添加pthread_join会处理这些投诉。

在线阅读各种pthread教程我发现，如上面的代码那样，创建和销毁这么多线程可能毫无意义。让N个线程同时运行可能更有效，并使用互斥和共享内存在“BOSS”线程和“WORKER”线程之间传递数据，只在程序结束时杀死一次WORKER线程。所以这是我最终必须调整的东西，但是上面的代码有什么明显的错误吗？

编辑：我越来越频繁地注意到一些关键字。我正在尝试创建的东西的术语显然是线程池。此外，有各种标准实现的建议，例如在boost库中有boost :: threadpool，boost :: task，boost :: thread。其中一些似乎只是提案。我在这里遇到过线程，人们提到you can combine ASIO and boost::thread来完成我正在寻找的东西。类似地，有一个消息队列类。

嗯，所以看起来我正在抓住许多人现在正在考虑的话题的表面，但它似乎有点生发，就像OOP是在1989年或者别的什么。

Answer 1

尝试启用核心转储（ulimit -c unlimited），然后在没有gdb的情况下运行程序。当它崩溃时，它应该留下一个核心文件，然后你可以用gdb打开它并开始调查（gdb <executable-file> <core-file>）。

Answer 2

关于top，你使用了多少个线程？我在顶部输出中没有看到DATA，但在使用线程时看到了虚拟列气球。我的理解（也许我应该确定）是每个线程都有自己可能使用的内存空间。该内存实际上并未被使用，如果需要它只是可用，这就是为什么这个数字可以变得非常高而不会真正引起问题。记忆本身可能不是灾难性的。您应该看看DATA利用率是否与您正在使用的线程数呈线性关系。

关于gdb。正如您所指出的那样，gdb不会修复您的代码，但如果您破坏了内存，它可能会在您的错误发生的地方移动。如果腐败发生在您不回复的区域或您已经释放并且不尝试重复使用问题的症状将会消失。离开，直到您需要在某个关键区域演示或使用您的代码。

另外，你想看看helgrind，valgrind的一部分。如果你有锁定问题，这种东西就是它的面包和黄油：

Helgrind是一个Valgrind工具，用于检测使用POSIX pthreads线程原语的C，C ++和Fortran程序中的同步错误。

只是做：

valgrind --tool=helgrind {your program}

Answer 3

你确定它是完整的代码吗？我没有看到你在哪里创建线程或者从哪里调用BuildKCData。

你应该在pthread_kill（）之后有一个内存障碍，尽管我怀疑它在这种情况下有所不同。

编辑：您对按顺序执行和缓存一致性感到困惑。

缓存一致性： x86（当前）保证对齐的4字节访问是原子的，因此线程A中的a[0]=123和线程B中的a[1]=456将起作用 - 线程C最终会看到“123,456”。有各种缓存一致性协议，但我相信它大约是一个MRSW锁。

无序执行： x86不保证读取顺序（可能还有写入;关于linux内核是否需要sfence的争论）。这样可以让CPU更有效地预取数据，但是线程A中的a[0]=123,a[1]和线程B中的a[1]=456,a[0]都可以返回0，因为[1]的获取可以在加载[1]之前发生。 0]。有两种解决方法：

仅在您持有锁时访问共享数据。特别是，不要读取锁外的共享数据。这是否意味着每个条目的锁定或整个阵列的锁定取决于您，以及您认为锁争用可能是什么样的（提示：它通常不是很大）。
在需要整理的事物之间留下记忆障碍。这很难做到（pthread甚至没有内存障碍; pthread_barrier更像是同步点。）

虽然内存障碍是最近的趋势，但锁定远更容易正确（我持有锁，因此没有其他人可以更改我脚下的数据）。内存障碍在某些圈子里风靡一时，但还有很多东西要做对（我希望这个读取是原子的，我希望其他线程原子地写，我希望其他线程使用屏障，哦是的，我也需要使用屏障）。

如果锁定太慢，减少争用将比用障碍替换锁更有效，并希望你做对了。

一个使用pthreads的简单boss-worker模型

3 个答案: