Question

我正在为Linux上的（pthread）多线程C ++程序进行调试。

当线程数较小时，例如1,2,3，它可以正常工作。

当线程数增加时，我得到了SIGSEGV（分段错误，UNIX信号11）。

但是，当我将线程数增加到4以上时，错误有时会出现，有时会消失。

我用了valgrind，我得到了

== 29655 ==使用信号11（SIGSEGV）的默认操作终止进程

== 29655 ==不在地址0xFFFFFFFFFFFFFFF8的映射区域内访问

== 29655 ==在0x3AEB69CA3E：std :: string :: assign（std :: string const＆amp;）（在/ usr / lib64 / libstdc ++。so.6.0.8中）

== 29655 == by 0x42A93C：bufferType :: getSenderID（std :: string＆amp;）const（boundedBuffer.hpp：29）

似乎我的代码试图读取未分配的内存。但是，我找不到函数getSenderID（）中的任何错误。它只返回Class bufferType中的成员数据字符串。它已被初始化。

我使用GDB和DDD（GDB GUI）来查找错误，该错误也指向那里，但错误有时会消失，因此在GDB中，我无法使用断点捕获它。

此外，我还打印出valgrind指向的函数的值，但它没有用，因为多个线程打印出具有不同顺序的结果并且它们相互交错。每次运行代码时，打印输出都不同。

bufferType在地图中，地图可能有多个条目。每个条目可以由一个线程写入，并由另一个线程同时读取。我使用pthread读/写锁来锁定pthread_rwlock_t。现在，没有SIGSEGV，但程序在某些方面停止而没有进展。我认为这是一个僵局。但是，一个映射条目只能在一个时间点只由一个线程写入，为什么还有死锁？

请您推荐一些捕获错误的方法，以便无论我使用多少线程来运行代码，都可以找到它。

感谢

boundedBuffer.hpp的代码如下：

 class bufferType
 {
 private:

    string senderID;// who write the buffer

    string recvID; // who should read the buffer

    string arcID; // which arc is updated

    double price; // write node's price 

    double arcValue; // this arc flow value 

    bool   updateFlag ;

    double arcCost;


    int  arcFlowUpBound; 

    //boost::mutex  senderIDMutex; 

    //pthread_mutex_t  senderIDMutex; 

    pthread_rwlock_t       senderIDrwlock;

    pthread_rwlock_t    setUpdateFlaglock;

  public: 
   //typedef boost::mutex::scoped_lock lock;  // synchronous read / write 

   bufferType(){}

   void   getPrice(double& myPrice ) const {myPrice = price;}

   void   getArcValue(double& myArcValue ) const {myArcValue = arcValue;}

   void   setPrice(double& myPrice){price = myPrice;}

   void   setArcValue(double& myValue ){arcValue = myValue;}

   void   readBuffer(double& myPrice, double& myArcValue );

   void   writeBuffer(double& myPrice, double& myArcValue );

   void   getSenderID(string& myID) 

   {
       //boost::mutex::scoped_lock lock(senderIDMutex);
      //pthread_rwlock_rdlock(&senderIDrwlock); 
      cout << "senderID is " << senderID << endl ; 
      myID = senderID;
      //pthread_rwlock_unlock(&senderIDrwlock);
   }
//void   setSenderID(string& myID){ senderID = myID ;}

    void   setSenderID(string& myID)

    { 
        pthread_rwlock_wrlock(&senderIDrwlock); 

            senderID = myID ;

            pthread_rwlock_unlock(&senderIDrwlock);
    }

    void   getRecvID(string& myID) const {myID = recvID;}

    void   setRecvID(string& myID){ recvID = myID ;}

    void   getArcID(string& myID) const {myID  = arcID ;}

    void   setArcID(string& myID){arcID = myID ;}

    void   getUpdateFlag(bool& myFlag)
    {
            myFlag = updateFlag ; 

        if (updateFlag)

           updateFlag  = false; 
    }

//void   setUpdateFlag(bool myFlag){ updateFlag = myFlag ;}

    void   setUpdateFlag(bool myFlag)
    { 
        pthread_rwlock_wrlock(&setUpdateFlaglock);

        updateFlag = myFlag ;

         pthread_rwlock_unlock(&setUpdateFlaglock);

    }

   void   getArcCost(double& myc) const {myc = arcCost; }

   void   setArcCost(double& myc){ arcCost = myc ;}

   void   setArcFlowUpBound(int& myu){ arcFlowUpBound = myu ;}

   int    getArcFlowUpBound(){ return arcFlowUpBound ;}

   //double getLastPrice() const {return price; }

   } ;

从代码中，您可以看到我尝试使用读/写锁来确保不变。 map中的每个条目都有一个如上所述的缓冲区。现在，我遇到了僵局。

Answer 1

Access not within mapped region at address 0xFFFFFFFFFFFFFFF8

at 0x3AEB69CA3E: std::string::assign(std::string const&)

这通常意味着您要分配string* NULL，然后减少。例如：

#include <string>

int main()
{
  std::string *s = NULL;

  --s;
  s->assign("abc");
}

g++ -g t.cc && valgrind -q ./a.out

...
==20980== Process terminating with default action of signal 11 (SIGSEGV): dumping core
==20980==  Access not within mapped region at address 0xFFFFFFFFFFFFFFF8
==20980==    at 0x4EDCBE6: std::string::assign(char const*, unsigned long)
==20980==    by 0x400659: main (/tmp/t.cc:8)

...

因此，请向我们展示boundedBuffer.hpp中的代码（带行号），认为该代码如何以指向-8的字符串指针结束。

请您推荐一些捕获错误的方法，以便无论我使用多少线程来运行代码，都可以找到它。

在考虑多线程程序时，必须考虑不变量。您应该使用断言来确认您的不变量是否成立。您应该认为如何违反这些行为，以及哪些违规行为会导致您观察到的验尸状态。

Answer 2

您是否有任何情况下在一个线程中访问对象（如字符串）而另一个线程正在或可能正在修改它？这就是这类问题的常见原因。

Answer 3

查看你的bufferType实例。

什么时候实例化？

如果在生成线程之前实例化它，然后其中一个线程对其进行了修改，那么就会出现没有锁定的竞争条件。

另外，请注意该bufferType附近或内部的任何静态变量。

从它的外观来看，其中一个线程可能修改了getSenderID（）返回的成员。

如果这些问题都没有导致您的错误，请尝试使用valgrind's drd。

如何在Linux上找到C ++（pthread）多线程程序中的（分段错误）错误？

boundedBuffer.hpp的代码如下：

3 个答案: