多线程处理多行文件的最佳方法

时间:2017-08-07 11:12:43

标签: multithreading io fread

我有一些单独的文件,我想处理文件的每一行(顺序和独立),我希望它快。

所以我编写了一个代码,用于将大块文件读入ram中的缓冲区,然后多线程将竞争从缓冲区读取行并处理它们。伪代码如下:

do{
  do{      

    fread(buffer,500MB,1,file);
    // creating threads
    // let the threads compete to read from buffer and PROCESS independently
    // end of threads

  while( EOF not reached )
  file = nextfile;
while( there is another file to read )

或者这个:

void mt_ReadAndProcess(){
  lock();
  fread(buffer,50MB,1,file);
  if(EOF reached)
    file = nextfile;
  unlock();
  process();
}
main(){
  // create multi threads
  // call mt_ReadAndProcess() with multi threads
}

过程是一个(及时)昂贵的过程。

有没有更好的方法呢?更快地读取文件或使用多线程处理文件的方法是什么?

全部谢谢,

阿米尔。

1 个答案:

答案 0 :(得分:0)

为什么你想让线程“竞争从缓冲区读取”?数据可以通过读取的线程读取来轻松分区。争夺从缓冲区获取数据的努力没有任何收获,同时可能浪费CPU和挂钟时间。

由于您是逐行处理的,只需从文件中读取行并将缓冲区通过指针传递给工作线程。

假设您正在运行符合POSIX标准的系统,请执行以下操作:

#include <unistd.h>
#include <pthread.h>

#define MAX_LINE_LEN 1024
#define NUM_THREADS 8

// linePipe holds pointers to lines sent to
// worker threads
static int linePipe[ 2 ];

// bufferPipe holds pointers to buffers returned
// from worker threads and used to read data
static int bufferPipe[ 2 ];

// thread function that actually does the work
void *threadFunc( void *arg )
{
    const char *linePtr;

    for ( ;; )
    {
        // get a pointer to a line from the pipe
        read( linePipe[ 1 ], &linePtr, sizeof( linePtr ) );

        // end loop on NULL linePtr value
        if ( !linePtr )
        {
            break;
        }

        // process line

        // return the buffer
        write( bufferPipe[ 0 ], &linePtr, sizeof( linePtr ) );
    }

    return( NULL );
}

int main( int argc, char **argv )
{
    pipe( linePipe );
    pipe( bufferPipe );

    // create buffers and load them into the buffer pipe for reading
    for ( int ii = 0; ii < ( 2 * NUM_THREADS ); ii++ )
    {
        char *buffer = malloc( MAX_LINE_LEN );
        write( bufferPipe[ 0 ], &buffer, sizeof( buffer ) );
    }

    pthread_t tids[ NUM_THREADS ];
    for ( int ii = 0; ii < NUM_THREADS; ii++ )
    {
        pthread_create( &( tids[ ii ] ), NULL, thread_func, NULL );
    }

    FILE *fp = ...

    for ( ;; )
    {
        char *linePtr;

        // get the pointer to a buffer from the buffer pipe 
        read( bufferPipe[ 1 ], &linePtr, sizeof( linePtr ) );

        // read a line from the current file into the buffer
        char *result = fgets( linePtr, MAX_LINE_LEN, fp );

        if ( result )
        {
            // send the line to the worker threads
            write( linePipe, &linePtr, sizeof( linePtr ) );
        }
        else
        {
            // either end loop, or open another file
            fclose( fp );
            fp = fopen( ... );
        }
    }

    // clean up and exit

    // send NULL to cause worker threads to stop
    char *nullPtr = NULL;
    for ( int ii = 0; ii < NUM_THREADS; ii++ )
    {
        write( linePipe[ 0 ], &nullPtr, sizeof( nullPtr ) );
    }

    // wait for worker threads to stop
    for ( int ii = 0; ii < NUM_THREADS; ii++ )
    {
        pthread_join( tids[ ii ], NULL );
    }

    return( 0 );
}