如何读取c中的大(593683字节)文件?

时间:2018-02-05 08:07:59

标签: c file

我正在尝试读取最大为4gb的文件。但它总是在实际出现之前跳过/检测eof。我正在尝试使用以下代码阅读此file

    #include <stdio.h>
    #include<stdlib.h>
    #include<stdbool.h>
    #define buffer 8000

   int main(int argc, char* argv[]){
    FILE *in = fopen(argv[1],"rb");
    if(!in){
        printf("File open failed %p\n",in);
        return 0;
    }
    char* tmp = malloc(buffer*sizeof(char));
    int filesize = 0;
    while(1){

        filesize = fread(tmp,1,buffer,in);
        if(filesize == buffer){
            printf("%d] %s\n\n",filesize,tmp);
        }else if(filesize < buffer){
            printf("%d] %s\n\n",filesize,tmp);
            break;
        }
    }
    free(tmp);
    fclose(in);
    return 0;
 }

上次缓冲:

  

1683]缺少船只告诉......

之后打印[注意它不打印文件大小]]:

  我可能跟我说话。他的一个姿态可以摧毁我,一个人   字链我在船上。   但是十人即将罢工。现在是我离开我的时刻   房间,加入我的同伴......   EOF检测到1683!

1 个答案:

答案 0 :(得分:2)

代码基本上正常工作 - 最后一个缓冲区确实是1683字节长,并且确实启动e missing vessel。观察运行结果:

$ tail -c 1683 2000010.txt
e missing vessel tell us by its nationality that of Captain Nemo?

I hope so.  And I also hope that his powerful vessel has conquered
the sea at its most terrible gulf, and that the Nautilus has survived
where so many other vessels have been lost!  If it be so--if Captain
Nemo still inhabits the ocean, his adopted country, may hatred be
appeased in that savage heart!  May the contemplation of so many wonders
extinguish for ever the spirit of vengeance!  May the judge disappear,
and the philosopher continue the peaceful exploration of the sea!
If his destiny be strange, it is also sublime.  Have I not understood
it myself?  Have I not lived ten months of this unnatural life?
And to the question asked by Ecclesiastes three thousand years ago,
"That which is far off and exceeding deep, who can find it out?"
two men alone of all now living have the right to give an answer----

CAPTAIN NEMO AND MYSELF.


The end of Project Gutenberg etext of "Twenty Thousand Leagues
Under the Sea"


I have made the following changes to the text:

PAGE LINE ORIGINAL CHANGED TO
  32    36  mizen-mast             mizzen-mast
  66     5  Arronax                Aronnax
  87    33  zoophites              zoophytes
  89    22  aparatus               apparatus
  96    28  dirunal                diurnal
  97     8  Arronax                Aronnax
 123    23  porphry                porphyry
 141     8  Arronax                Aronnax
 146    30  sideral                sidereal
 177    30  Arronax                Aronnax
 223     4  commmit                commit
 258    16  swiftiest              swiftest
 274     2  occured                occurred


 $

您可以为代码添加累积大小计数器,并打印该值以及当前缓冲区大小。我修改了代码只打印每个块的前30个字符。你可以打印块编号 - 有74个块,8000个字符,592,000个加1,683个,总共593,683个。您在没有约束的情况下打印数据,因此在最后一个块结束后倒数第二个块中还留下了8000-1683 = 6317个杂散字符。 fread()没有添加空终止符 - 您没有读取字符串,但是您正在读取字节块。

进行计数更改:

#include <stdio.h>
#include <stdlib.h>

#define buffer 8000

int main(int argc, char *argv[])
{
    if (argc != 2)
    {
        fprintf(stderr, "Usage: %s file\n", argv[0]);
        return 1;
    }
    FILE *in = fopen(argv[1], "rb");
    if (!in)
    {
        fprintf(stderr, "File open failed %s\n", argv[1]);
        return 0;
    }
    char *tmp = malloc(buffer * sizeof(char));
    if (tmp == 0)
    {
        fprintf(stderr, "Memory allocation failed for %d bytes\n", buffer);
        return 1;
    }
    int filesize = 0;
    int cum_size = 0;
    int block_no = 0;
    while (1)
    {
        filesize = fread(tmp, 1, buffer, in);
        cum_size += filesize;
        block_no++;
        printf("%2d:%6d:%4d] %.30s\n", block_no, cum_size, filesize, tmp);
        if (filesize < buffer)
            break;
    }
    free(tmp);
    fclose(in);
    return 0;
}

输出开始了:

 1:  8000:8000] **The Project Gutenberg Etext 
 2: 16000:8000] ndemnify and hold the Project,
 3: 24000:8000] he question seemed buried,
ne
 4: 32000:8000] rrival in New York several per
 5: 40000:8000] boy, who had accompanied
me i
 6: 48000:8000]  calls himself Canadian calls 
 7: 56000:8000] lp in chasing
a whale they ha

并结束:

"Exactly, Consei
68:544000:8000] e our arrival on board, and wh
69:552000:8000] e have entered upon it, let us
70:560000:8000] ric cable lying
on the bottom
71:568000:8000] ook part in the battle of Comt
72:576000:8000]  coldly.  "And I advise you no
73:584000:8000] e of guns and the netting.  Th
74:592000:8000]  madman; fortunately I resiste
75:593683:1683] e missing vessel tell us by it

通过明确限制每个块中打印的字符数,我主要避免了字符串的空终止问题。但是,严格来说,您应该确保不要尝试超出tmp缓冲区的末尾。如果我想打印完整的缓冲区读取,我应该使用:

printf("%2d:%6d:%4d] %.*s\n", block_no, cum_size, filesize, filesize, tmp);

您可以使用:

printf("%d] %.*s\n", filesize, filesize, tmp);