我正在尝试读取大型数据文件(44GB - 63GB)
我正在做的事情的逻辑是分配1GB的内存,一次读取一个GB,一次分成1MB,并用1MB做一些通用散列(同时在两者之间进行一些性能测试)
这是我到目前为止所做的:
关于如何在进一步研究之前对此进行测试的任何想法?至于知道我的程序是否正在转储已分配的1GB内存,然后递增直到我最终到达文件末尾
我知道我还有很多工作要做,但到目前为止,我仍然坚持如何增加内存的文件分配并继续释放该空间直到文件被读取并完成性能测试
我想只计算散列在整个文件中花费的时间,我不想花时间计算读取文件需要多长时间,只想知道执行时间需要多长时间文件中的哈希
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <time.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
//#define BUFFERSIZEGB 1024*1024*1024 // how many bytes in one gb
//#define BIFFERSIZEMB 1024*1024 // how many bytes in one mb
int main()
{
struct stat fileSize;
char *buffGB, *hashString;
FILE *fp;
clock_t elapsed;
char fname[40];
printf("Enter name of file:");
fgets(fname, 40, stdin);
while (fname[strlen(fname) - 1] == '\n')
{
fname[strlen(fname) -1] = '\0';
}
// handle file, open file, and read in binary form
fp = fopen(fname, "rb");
if (fp == NULL)
{
printf("Cannot open %s for reading\n", fname);
exit(1);
}
stat(fname, &fileSize);
size_t size = fileSize.st_size;
printf("Size of file: %zd\n", size);
// allocate one gbs worth of memory (buffersize is in bytes)
buffGB = malloc(sizeof(*buffGB)*1024*1024*1024); // allocate memory for one gb
hashString = malloc(sizeof(*hashString)*1024*1024);
// read the file again one gb at a time? So that I can some cool stuff for testing
while (fread(buffGB, sizeof(*buffGB), (1024*1024*1024)), fp) == (1024*1024*1024))
{
elapsed = clock(); // get starting time
// now read in 1MB at a time (1024*1024 bytes) // use fgets
if (fgets(hashString, (1024*1024), fp) != NULL) // or pointer to the 1GB?
{
puts(hashString);
// do some generic hash
// hash that "hashString"
}
elapsed = clock() - elapsed; // sum up the time until it's finished with the loop
}
// flush the memory and start over again to read the next gb
// increment by as many times needed until the large gb file has been read?
free(buffGB);
free(hashString);
fclose (fp); // close files
printf("Hashing took %.2f seconds.\n", (float)elapsed/CLOCKS_PER_SEC);
return 0;
}
感谢您的提前帮助