多线程列表共享性能

时间:2016-08-26 09:52:00

标签: c++ multithreading linked-list

我正在开发一个应用程序,它以大约800 Mbps的速度从Windows 7上的命名管道读取数据。我必须用几个线程开发它,因为如果我无法以给定的速度读取,管道另一侧的FIFO会溢出。虽然表现真的很可怜,但我无法理解为什么。我已经阅读了几个我试图拆分内存以避免内存不良共享的事情。

一开始我认为我可能是连续内存需求的问题,但是内存部分在列表中排队,主线程在排队后不再使用它们。记忆量巨大,所以我不会把它们放在相同的页面上。

这是线程函数:

void splitMessage(){
    char* bufferMSEO;
    char* bufferMDO;
    std::list<struct msgBufferStr*> localBufferList;

    while(1)
    {
        long bytesProcessed = 0;
        {
            std::unique_lock<std::mutex> lk(bufferMutex);
            while(bufferList.empty())
            {
            // Wait until the map has data      
                listReady.wait(lk);
            }   
            //Extract the data from the list and copy to the local list
            localBufferList.splice(localBufferList.end(),bufferList);

            //Unlock the mutex and notify
            // Manual unlocking is done before notifying, to avoid waking up
            // the waiting thread only to block again (see notify_one for details)
            lk.unlock();
            //listReady.notify_one();
        }

        for(auto nextBuffer = localBufferList.begin(); nextBuffer != localBufferList.end(); nextBuffer++)
        {
            //nextBuffer = it->second();
            bufferMDO = (*nextBuffer)->MDO;
            bufferMSEO = (*nextBuffer)->MSEO;
            bytesProcessed += (*nextBuffer)->size; 

             //Process the data Stream            
              for(int k=0; k<(*nextBuffer)->size; k++)
              { 

              }   

            //localBufferList.remove(*nextBuffer);
            free(bufferMDO);
            free(bufferMSEO);
            free(*nextBuffer);
        }
        localBufferList.clear();

    }

}

这里是读取数据并将其排队的线程:

DWORD WINAPI InstanceThread(LPVOID lpvParam)
// This routine is a thread processing function to read from and reply to a client
// via the open pipe connection passed from the main loop. Note this allows
// the main loop to continue executing, potentially creating more threads of
// of this procedure to run concurrently, depending on the number of incoming
// client connections.
{ 
   HANDLE hHeap      = GetProcessHeap();
   TCHAR* pchRequest = (TCHAR*)HeapAlloc(hHeap, 0, BUFSIZE*sizeof(TCHAR));

   DWORD cbBytesRead = 0, cbReplyBytes = 0, cbWritten = 0; 
   BOOL fSuccess = FALSE;
   HANDLE hPipe  = NULL;
   double totalRxData = 0;

   char* bufferPnt;

   char* bufferMDO;
   char* bufferMSEO;

   char* destPnt;



    // Do some extra error checking since the app will keep running even if this
   // thread fails.
   if (lpvParam == NULL)
   {
       printf( "\nERROR - Pipe Server Failure:\n");
       printf( "   InstanceThread got an unexpected NULL value in lpvParam.\n");
       printf( "   InstanceThread exitting.\n");
       if (pchRequest != NULL) HeapFree(hHeap, 0, pchRequest);
       return (DWORD)-1;
   }

   if (pchRequest == NULL)
   {
       printf( "\nERROR - Pipe Server Failure:\n");
       printf( "   InstanceThread got an unexpected NULL heap allocation.\n");
       printf( "   InstanceThread exitting.\n");
       return (DWORD)-1;
   }

   // Print verbose messages. In production code, this should be for debugging only.
   printf("InstanceThread created, receiving and processing messages.\n");

    // The thread's parameter is a handle to a pipe object instance.  
   hPipe = (HANDLE) lpvParam; 


    try
    {
        msgSplitter = std::thread(&splitMessage);
        //msgSplitter.detach();
    }
    catch(...)
    {
        _tprintf(TEXT("CreateThread failed, GLE=%d.\n"), GetLastError()); 
        return -1;
    }


   while (1) 
   { 
       struct msgBufferStr *newBuffer = (struct msgBufferStr* )malloc(sizeof(struct msgBufferStr));
   // Read client requests from the pipe. This simplistic code only allows messages
   // up to BUFSIZE characters in length.
      fSuccess = ReadFile( 
         hPipe,        // handle to pipe 
         pchRequest,    // buffer to receive data 
         BUFSIZE*sizeof(TCHAR), // size of buffer 
         &cbBytesRead, // number of bytes read 
         NULL);        // not overlapped I/O 

      if (!fSuccess || cbBytesRead == 0)
      {   
          if (GetLastError() == ERROR_BROKEN_PIPE)
          {
              _tprintf(TEXT("InstanceThread: client disconnected.\n"), GetLastError()); 
              break;
          }
          else if (GetLastError() == ERROR_MORE_DATA)
          {
          }
          else
          {
              _tprintf(TEXT("InstanceThread ReadFile failed, GLE=%d.\n"), GetLastError()); 
          }          
      }

      //timeStart = omp_get_wtime();

      bufferPnt = (char*)pchRequest;
      totalRxData += ((double)cbBytesRead)/1000000;

      bufferMDO =  (char*) malloc(cbBytesRead);
      bufferMSEO =  (char*) malloc(cbBytesRead/3);
      destPnt =  bufferMDO;   

      //#pragma omp parallel for
      for(int i = 0; i < cbBytesRead/12; i++)
      {
          msgCounter++;
          if(*(bufferPnt + (i * 12)) == 0) continue;
          if(*(bufferPnt + (i * 12)) == 8)
          {
              errorCounter++;
              continue;
          }

          //Use 64 bits variables in order to make less operations
          unsigned long long *sourceAddrLong = (unsigned long long*) (bufferPnt + (i * 12));
          unsigned long long *destPntLong = (unsigned long long*) (destPnt + (i * 8));
          //Copy the data bytes from source to destination
          *destPntLong = *sourceAddrLong;

          //Copy and prepare the MSEO lines for the data processing
          bufferMSEO[i*4]=(bufferPnt[(i * 12) + 8] & 0x03);
          bufferMSEO[i*4 + 1]=(bufferPnt[(i * 12) + 8] & 0x0C) >> 2;
          bufferMSEO[i*4 + 2]=(bufferPnt[(i * 12) + 8] & 0x30) >> 4;
          bufferMSEO[i*4 + 3]=(bufferPnt[(i * 12) + 8] & 0xC0) >> 6;

      }

      newBuffer->size = cbBytesRead/3;
      newBuffer->MDO = bufferMDO;
      newBuffer->MSEO = bufferMSEO;

    {
        //lock the mutex
        std::lock_guard<std::mutex> lk(bufferMutex);
        //add data to the list
        bufferList.push_back(newBuffer);
    } // bufferMutex is automatically released when lk goes out of scope

        //Notify
        listReady.notify_one();

  }

    // Flush the pipe to allow the client to read the pipe's contents 
    // before disconnecting. Then disconnect the pipe, and close the 
    // handle to this pipe instance. 

   FlushFileBuffers(hPipe); 
   DisconnectNamedPipe(hPipe); 
   CloseHandle(hPipe); 

   HeapFree(hHeap, 0, pchRequest);

   //Show memory leak isues
  _CrtDumpMemoryLeaks();

  //TODO: Join thread

   printf("InstanceThread exitting.\n");
   return 1;
}

真正让我感到震惊的是,我想让它像这样,即使第一个线程很久以前读完数据,splitMessage线程也需要几分钟才能读取数据。我的意思是读取线程在几秒钟内读取1,5Gb或信息,并等待来自管道的更多数据。这些数据由拆分线程处理(唯一一个真正&#34;做什么&#34;几乎一分钟或更长时间内的事情)。此外,CPU的使用率仅低于20%。 (这是一个带有16 Gb RAM和8个内核的i7 labtop!)

另一方面,如果我只是在流程线程中注释for循环:

for(int k=0; k<(*nextBuffer)->size; k++)

然后缓慢读取数据并且管道另一侧的FIFO溢出。使用8个处理器并且速度超过2 GHz时应该快速投入缓冲区而不会出现太多问题,不是吗?我认为它必须是一个内存访问问题或调度程序以某种方式发送线程睡眠但我无法弄清楚为什么!!其他可能性是迭代抛出链接列表与迭代器不是最佳的。

任何帮助都是geat,因为我试图了解它几天,我在代码中做了几处更改并试图最大限度地简化并且我变得疯狂:)。

最好的问候, 曼努埃尔

0 个答案:

没有答案