Question

好的，我一直在阅读fread（）[返回类型size_t]并看到几篇关于大文件的帖子和其他一些问题的帖子 - 但我仍然有一些问题。此函数传入文件指针和long long int。 lld来自main，我使用另一个函数来获取实际的文件大小，即6448619520字节。

char *getBuffer(FILE *fptr, long long size) {
    char *bfr;
    size_t result;

    printf("size of file in allocate buffer:  %lld\n", size);
        //size here is 6448619520


    bfr = (char*) malloc(sizeof(char) * size);
    if (bfr == NULL) {
        printf("Error, malloc failed..\n");
        exit(EXIT_FAILURE);
    }
        //positions fptr to offset location which is 0 here.
    fseek(fptr, 0, SEEK_SET);
        //read the entire input file into bfr
    result = fread(bfr, sizeof(char), size, fptr);
    printf("result = %lld\n",  (long long) result);


    if(result != size)
    {
        printf("File failed to read\n");
        exit(5);
    }
    return (bfr);

}

我已经测试了大约1-2gb的文件并且它工作正常，但是，当我在6gb文件上测试时，没有任何内容被读入缓冲区。忽略其他结果，（专注于粗体结果），问题在于读取数据bfr。以下是我得到的一些结果。

文件的第一个735844352字节（700 + MB）


root @ redbox：/ data / projects / C / stubs /＃。/testrun -x 45004E00 -i /data/Helix2008R1.iso


图像文件是/data/Helix2008R1.iso
  十六进制字符串= 45004E00
  ＆gt;文件总大小：735844352
  获取缓冲区中文件的大小：735844352
  结果= 735844352
**   开始解析命令行十六进制值：45004E00
  十六进制字符串中的总字节数：4


十六进制字符串搜索的结果：
  在字节位置找到十六进制字符串45004E00：37441
  在字节位置找到十六进制字符串45004E00：524768
  ....

针对6gb文件运行＃2： root @ redbox：/ data / projects / C / stubs /＃。/testrun -x BF1B0650 -i /data/images/sixgbimage.img

图像文件是/data/images/sixgbimage.img
十六进制字符串= BF1B0650
文件总大小：6448619520
分配缓冲区中文件的大小：6448619520
结果= 0
文件无法阅读

我仍然不确定为什么它失败了大文件而不是小文件，它是一个＆gt; 4gb问题。我使用以下内容：

/* Support Large File Use */
#define _LARGEFILE_SOURCE 1
#define _LARGEFILE64_SOURCE 1
#define _FILE_OFFSET_BITS   64

BTW，我使用的是ubuntu 9.10盒子（2.6.x内核）。 TIA。

Answer 1

如果您只是阅读文件而不是修改它，我建议您使用mmap(2)代替fread(3)。这应该更有效率，虽然我没有在大文件上尝试过。你需要改变我非常简单的发现/找不到报告偏移，如果这是你想要的，但我不确定你想要指针。：）

#define _GNU_SOURCE
#include <string.h>

#include <fcntl.h>
#include <sys/mman.h>
#include <stdio.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <unistd.h>


int main(int argc, char* argv[]) {
    char *base, *found;
    off_t len;
    struct stat sb;
    int ret;
    int fd;
    unsigned int needle = 0x45004E00;

    ret = stat(argv[1], &sb);
    if (ret) {
            perror("stat");
            return 1;
    }

    len = sb.st_size;

    fd = open(argv[1], O_RDONLY);
    if (fd < 0) {
            perror("open");
            return 1;
    }

    base = mmap(NULL, len, PROT_READ, MAP_PRIVATE, fd, 0);
    if (!base) {
            perror("mmap");
            return 1;
    }

    found = memmem(base, len, &needle, sizeof(unsigned int));
    if (found)
            printf("Found %X at %p\n", needle, found);
    else
            printf("Not found");
    return 0;
}

一些测试：

$ ./mmap ./mmap
Found 45004E00 at 0x7f8c4c13a6c0
$ ./mmap /etc/passwd
Not found

Answer 2

如果这是一个32位进程，正如你所说，那么size_t是32位，而你只是不能在进程的地址空间中存储超过4GB（实际上，实际上，略小于3GB）。在这一行：

bfr = (char*) malloc(sizeof(char) * size);

乘法的结果将以模SIZE_MAX + 1减少，这意味着它只会尝试并分配大约2GB。同样，此行中的size参数也会发生同样的事情：

result = fread(bfr, sizeof(char), size, fptr);

如果您希望在32位进程中处理大型文件，那么一次只能处理其中的一部分（例如，读取前100 MB，处理，读取，读取）接下来的100 MB，...）。您不能一次性读取整个文件 - 您的进程没有足够的可用内存。

Answer 3

当fread失败时，会设置errno以指出失败的原因。调用errno后返回零的fread值是多少？

<强>更新你是否需要一举阅读整个文件？如果您一次读取文件，例如512MB，会发生什么？

根据您上面的评论，您使用的是32位操作系统。在这种情况下，您将无法一次处理6 GB（对于一个，size_t将无法容纳那么大的数字）。但是，您应该能够以较小的块读入和处理文件。

我认为即使在64位操作系统上，将6GB文件读入内存可能也不是解决问题的最佳方法。您究竟要完成的是要求缓冲6GB文件？可能有更好的方法来解决这个问题。

Answer 4

您是否确认malloc和fread实际上正在使用正确类型的参数？您可能希望使用-Wall选项进行编译，并检查您的64位值是否实际被截断。在这种情况下，malloc不会报告错误，但最终分配的数量远远少于您的要求。

Answer 5

在接受了所有人的建议之后，我将6GB文件分解为4K块，解析了十六进制字节，并且能够得到字节位置，这将有助于我以后从VMFS分区中取出MBR dd imaged。这是每个块读取它的快速而肮脏的方式：

#define DEFAULT_BLOCKSIZE 4096
...

while((bytes_read = fread(chunk, sizeof(unsigned char), sizeof(chunk), fptr)) > 0) {
    chunkptr = chunk;
    for(z = 0; z < bytes_read; z++) {
        if (*chunkptr == pattern_buffer[current_search]) {
            current_search++;
            if (current_search > (counter - 1)) {
                current_search = 0;
                printf("Hex string %s was found at starting byte location:  %lld\n",
                       hexstring, (long long int) (offsetctr-1));
                matches++;
            }
        } else {
            current_search = 0;
        }
        chunkptr++;
        //printf("[%lld]: %02X\n", offsetctr, chunk[z] & 0xff);
        offsetctr++;
    }
    master_counter += bytes_read;
}

...

这是我得到的结果......

root@redbox:~/workspace/bytelocator/Debug# ./bytelocator -x BF1B0650 -i /data/images/sixgbimage.img 

Total size of /data/images/sixgbimage.img file:  6448619520 bytes
Parsing the hex string now: BF1B0650

Hex string BF1B0650 was found at starting byte location:  18
Hex string BF1B0650 was found at starting byte location:  193885738
Hex string BF1B0650 was found at starting byte location:  194514442
Hex string BF1B0650 was found at starting byte location:  525033370
Hex string BF1B0650 was found at starting byte location:  1696715251
Hex string BF1B0650 was found at starting byte location:  1774337550
Hex string BF1B0650 was found at starting byte location:  2758859834
Hex string BF1B0650 was found at starting byte location:  3484416018
Hex string BF1B0650 was found at starting byte location:  3909721614
Hex string BF1B0650 was found at starting byte location:  3999533674
Hex string BF1B0650 was found at starting byte location:  4018701866
Hex string BF1B0650 was found at starting byte location:  4077977098
Hex string BF1B0650 was found at starting byte location:  4098838010


Quick stats:
================
Number of bytes that have been read:  6448619520
Number of signature matches found:  13
Total number of bytes in hex string:  4

6gb文件上的fread（）失败

5 个答案: