Question

我编写了一个c程序来计算文本文件中的words，characters和lines。该程序正确读取行和单词，但不正确计算总字符数。我在Windows上使用GitBash，因此我使用wc命令检查我的程序的正确性。它总是显示 x 字符比我的程序的输出更多，其中 x 是否。在我的程序中的新行字符。这是我的计划：

#define IN 1 // if getc is reading the word
#define OUT 0 // if getc has read the word and now reading the spaces

int main()
{
    FILE *fp = fopen("lorum ipsum.txt","r");
    int lineCount = 0;
    int wordCount = 0;
    int charCount = 0;
    int c;
    int position = IN; //tells about the reading position of getc whether reading the word or has read the word

    while((c=getc(fp)) != EOF)
    {
        if(c == '\n')
        {
            lineCount++;
        }
        if(c == '\n' || c == '\t' || c==' ')
        {
            if(position == IN) // means just finished reading the word
            {
                wordCount++;
                position = OUT; // is now reading the white spaces  
            }
        }
        else if(position == OUT)
        {
            //puts("This position is reached");
            position = IN; //currently reading the word
        }

        charCount++;
    }

    // printing to output
    return 0;
}

这里整个代码无关紧要，重要的是我在charCount循环中为getc读取的每个字符增加while变量。

此外，我使用'\n'检查了sizeof()字符大小，它只是一个简单字符并占用1 Byte;所以我们应该把它算作一个。另外，从文件大小，我发现wc正在输出正确的结果。那么问题是，我的文本文件存储的编码是否有任何问题。< / p>

注意：每次我按ENTER在文字文件中添加换行符时，文件大小会增加 2 以及数字由wc命令计算的字符，但我的程序输出字符由一个更改。

编辑：根据好的答案我理解在换行符中有额外的\r个字符。所以当使用r模式时，它会将换行符解释为{{ 1}}仅在二进制模式\n显示实际rb时。以下是有关此行为的答案： what's the differences between r and rb in fopen

Answer 1

Windows新行包含两个字符。一个是\r作为回车而另一个是\n作为换行符。只检查\n，您错过了\r字符。

有关详细信息，请参阅What is the difference between \r and \n?。

Answer 2

end a line的方法有很多种。目前的Mac OS和Linux仅使用一个字节，但Windows使用CR-LF对，因为它从DOS开始使用。

在文本模式下打开文件时，它会自动将'\r\n'（或系统行结束类型）转换为'\n'并仅计数一次。使用printf和其他一些功能打印时，它会将'\n'转换为系统＆＃39;新队。因此，您应该以文本模式打开文件，除非您想要处理自己结束的行

我不知道您的预期输出是什么，但如果您想将每个新行计为1，请将其打开为文本。如果要将每个新行计为2或文件中的字节数，请将其打开为二进制。一般来说，我还没有看到任何人将新行计为2，所以你应该像正常一样打开文本。

换行符是否计算两次？

2 个答案: