有没有一种方法可以使用POSIX线程逐字节读取文本文件?

时间:2018-12-22 16:41:27

标签: c pthreads aio

我正在尝试使用C中的POSIX线程异步读取其内容并将其复制到另一个文件中。假设一个文件包含“ aabbcc”并且我有4个线程,我如何才能将“ aabbcc”复制到另一个具有C线程的异步文件中这个部分整整一天都被困在我的头上。我到目前为止所做的如下所示。

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <time.h>
#include <fcntl.h>
#include <pthread.h>
#include <aio.h>
#include <math.h> //for ceil() and floor()
#include <sys/types.h>
#include <unistd.h>

#define FILE_SIZE 1024 //in bytes

//>cc code.c -o code.out -lrt -lpthread
//>./code.out

char alphabets[52] = {'a','b','c','d','e','f','g','h','i','j','k','l','m','n','o',
                    'p','q','r','s','t','u','v','w','x','y','z',
                    'A','B','C','D','E','F','G','H','I','J','K','L','M','N','O',
                    'P','Q','R','S','T','U','V','W','X','Y','Z'};

long prepareInputFile(char* filename)
{
    FILE *fp;
    fp = fopen(filename, "w");
    if(fp == NULL)
    {
        printf("Cannot write to input file\n");
        return;
    }
    int index;
    char str[FILE_SIZE];
    int rand_size = (rand() % 1024)+1;
for(index = 0;index < rand_size;index++) /*Generate the file with random sizes in bytes*/
{
    int num2 = (rand() % 52); /*Get a random char in char array*/
    putc(alphabets[num2],fp); /*Write that char to the file pointed to by fp*/
}
putc('\n',fp);
fseek(fp, 0L, SEEK_END);
long size = ftell(fp);
fseek(fp, 0L, SEEK_SET);
return size;
}
//Perform main operation inside this function
void *writeToFileAsync(void *src_file, void *dest_file, 
                       void *thread, void *t_count, void *filesize)
{
    int readfd, writefd;
    struct aiocb aio_write, aio_read;
    memset(&aio_read, 0, sizeof(aio_read));
    aio_read.aio_fildes = readfd;
    aio_read.aio_nbytes = (int)filesize/(int)t_count;
    readfd = open((char *)src_file, O_RDONLY);
    if(readfd < 0)
    {
        printf("Cannot open the file for reading\n");
    }
    memset(&aio_write, 0, sizeof(aio_write));
    aio_read.aio_fildes = writefd;
    aio_read.aio_nbytes = (int)filesize/(int)t_count;
    writefd = open((void *)dest_file, O_CREAT | O_WRONLY);
    if(writefd < 0)
    {
        printf("Cannot open the file for writing\n");
    }
    return;
}
int main(int argc, char *argv[])
{
int i,threadCount;
char sourcePath[100], destPath[100];
strcpy(sourcePath,argv[1]);
if(strcmp(sourcePath, "-") == 0)
{
    getcwd(sourcePath, sizeof(sourcePath));
    strcpy(sourcePath, strcat(sourcePath, "/source.txt"));
}
else
{
    strcpy(sourcePath, strcat(sourcePath, "source.txt"));
}   
printf("Source path is: %s\n", sourcePath);
strcpy(destPath,argv[2]);
if(strcmp(destPath, "-") == 0)
{
    getcwd(destPath, sizeof(destPath));
    strcpy(destPath, strcat(destPath, "/destination.txt"));
}
else
{
    strcpy(destPath, strcat(destPath, "destination.txt"));
}
printf("Dest path is: %s\n", destPath);
threadCount = strtol(argv[3],NULL,10);
long file_size = prepareInputFile(sourcePath);
pthread_t threads[threadCount];
for(i=0;i<threadCount;i++)
{
    pthread_create(&threads[i],NULL,(void *)writeToFileAsync, NULL);
}
return 0;
}

任何帮助将不胜感激。

1 个答案:

答案 0 :(得分:1)

并行化此操作不太可能会有所帮助,因为它可能受I / O而不是CPU时间的限制,并且以这种方式进行复制肯定不会比仅通过系统调用进行复制快。

但是,如果要执行此操作,一种方法是:将输入文件映射到内存(使用mmap()或等效文件),创建目标缓冲区或内存映射文件,将源和目标分开文件分成相等的片,并让每个线程复制文件的片。您可能使用了memcpy(),但是现代的编译器可以看到您的循环在做什么并对其进行优化。

即使这不会像将源文件读取或映射到缓冲区,然后使用write()从同一缓冲区写回该文件一样快。如果您要做的只是将文件复制到磁盘,则完全不需要复制字节。实际上,您甚至可以再次链接到磁盘上的文件。

如果切片与页面边界对齐,这可能会最好地工作。对于将两个线程写入同一缓存行要非常小心,因为这会创建竞争条件。