我有一些单独的文件,我想处理文件的每一行(顺序和独立),我希望它快。
所以我编写了一个代码,用于将大块文件读入ram中的缓冲区,然后多线程将竞争从缓冲区读取行并处理它们。伪代码如下:
do{
do{
fread(buffer,500MB,1,file);
// creating threads
// let the threads compete to read from buffer and PROCESS independently
// end of threads
while( EOF not reached )
file = nextfile;
while( there is another file to read )
或者这个:
void mt_ReadAndProcess(){
lock();
fread(buffer,50MB,1,file);
if(EOF reached)
file = nextfile;
unlock();
process();
}
main(){
// create multi threads
// call mt_ReadAndProcess() with multi threads
}
过程是一个(及时)昂贵的过程。
有没有更好的方法呢?更快地读取文件或使用多线程处理文件的方法是什么?
全部谢谢,
阿米尔。
答案 0 :(得分:0)
为什么你想让线程“竞争从缓冲区读取”?数据可以通过读取的线程读取来轻松分区。争夺从缓冲区获取数据的努力没有任何收获,同时可能浪费CPU和挂钟时间。
由于您是逐行处理的,只需从文件中读取行并将缓冲区通过指针传递给工作线程。
假设您正在运行符合POSIX标准的系统,请执行以下操作:
#include <unistd.h>
#include <pthread.h>
#define MAX_LINE_LEN 1024
#define NUM_THREADS 8
// linePipe holds pointers to lines sent to
// worker threads
static int linePipe[ 2 ];
// bufferPipe holds pointers to buffers returned
// from worker threads and used to read data
static int bufferPipe[ 2 ];
// thread function that actually does the work
void *threadFunc( void *arg )
{
const char *linePtr;
for ( ;; )
{
// get a pointer to a line from the pipe
read( linePipe[ 1 ], &linePtr, sizeof( linePtr ) );
// end loop on NULL linePtr value
if ( !linePtr )
{
break;
}
// process line
// return the buffer
write( bufferPipe[ 0 ], &linePtr, sizeof( linePtr ) );
}
return( NULL );
}
int main( int argc, char **argv )
{
pipe( linePipe );
pipe( bufferPipe );
// create buffers and load them into the buffer pipe for reading
for ( int ii = 0; ii < ( 2 * NUM_THREADS ); ii++ )
{
char *buffer = malloc( MAX_LINE_LEN );
write( bufferPipe[ 0 ], &buffer, sizeof( buffer ) );
}
pthread_t tids[ NUM_THREADS ];
for ( int ii = 0; ii < NUM_THREADS; ii++ )
{
pthread_create( &( tids[ ii ] ), NULL, thread_func, NULL );
}
FILE *fp = ...
for ( ;; )
{
char *linePtr;
// get the pointer to a buffer from the buffer pipe
read( bufferPipe[ 1 ], &linePtr, sizeof( linePtr ) );
// read a line from the current file into the buffer
char *result = fgets( linePtr, MAX_LINE_LEN, fp );
if ( result )
{
// send the line to the worker threads
write( linePipe, &linePtr, sizeof( linePtr ) );
}
else
{
// either end loop, or open another file
fclose( fp );
fp = fopen( ... );
}
}
// clean up and exit
// send NULL to cause worker threads to stop
char *nullPtr = NULL;
for ( int ii = 0; ii < NUM_THREADS; ii++ )
{
write( linePipe[ 0 ], &nullPtr, sizeof( nullPtr ) );
}
// wait for worker threads to stop
for ( int ii = 0; ii < NUM_THREADS; ii++ )
{
pthread_join( tids[ ii ], NULL );
}
return( 0 );
}