Question

Introduction: I'm following 'The C Programming Language' by K&R and I came across a section of the following code that I'm unable to comprehend.

The Code:

#include <stdio.h>

int main()
{
    int c, i, nwhite, nother;

    int ndigit[10];

    nwhite = nother = 0;
    for (i = 0; i < 10; i++)
    {
        ndigit[i] = 0;
    }

    while ((c = getchar()) != EOF)
    {
        if (c >= '0' && c <= '9')
            ++ndigit[c - '0'];
        else if (c == ' ' || c == '\n' || c == '\t')
            ++nwhite;
        else
            ++nother;
    }

    printf("digits = ");

    for (i = 0; i < 10; i++)

    {
        printf(" %d", ndigit[i]);
    }

    printf(", white space = %d, other = %d\n", nwhite, nother);
}

Sample Input:

sample text for stackoverflow
try 123
123456789

Output:

digits =  0 2 2 2 1 1 1 1 1 1, white space = 7, other = 29
Program ended with exit code: 0

I understand what the code is doing but I'm unable to understand how it's doing it. Why have we used ++ndigit[c - '0']; in the code?

Answer 1

The part of the algorithm you seem to be having the most trouble with is the calculation of the digit frequency. Specifically, the indexing mechanics utilized therein.

The algorithm works by using a counter array. Specifically, this:

int ndigit[10];

is where the counts will be stored. Initially, this array is zero-filled with the subsequent for-loop:

for (i = 0; i < 10; i++)
{
    ndigit[i] = 0;
}

During processing each character is read from stdin. The characters are tested to be within one of the following sets:

Something greater-or-equal to the literal character '0', and lesser-or-equal to the literal character '9'. In this case it is considered a digit-char. More on this later.
Something that is one of a space, a newline, or a tab. These are considered whitespace characters.
Anything else

The first of these is the most complicated part of the algorithm. Characters (and character literals) are integer types. As such, they behave as said-same, and you can perform various mathematical operations on them. but those operations must make sense in regards to certain guarantees the language standard mandates.

The language standard mandates that all digit character codes shall be contiguous in their representation on any complying platform. The standard ASCII character encoding uses byte values of 48 thru 57 (hex 0x30 through 0x39). See asciitable.com for the full skinny on ASCII encodings. For an alternate encoding, those values may be different. For example, standard EBCDIC (used primarily on IBM minis and mainframes at this point) use values 240-249. While this may seem trivial, it is important for how the code you're presenting works. In all cases (ASCII or otherwise), the digit characters are contiguous (all together and sequential). The language standard mandates this.

So why should that matter? It matters because it allows you to do things like this (using ASCII as an example here on out):

int x = '5' - '0';

The result is x will be 5. Note I did not say '5' (the character); I said the integer number 5. That's because the actual ASCII encoding means the integer calculation you're actually getting is:

//      '5'  '0'
int x = 53 - 48;

Often you hear people talk about "magic number programming", and this is usually one example of that. When you see code that says:

if (c == 48)
    do something with the char because it's a zero character

Don't do that. Keep the code clean and use the literal constants. It is far easier to read

if (c == '0')

and the intent is cleaner.

Anyway, so how does this play in to your code? Well, it plays in here:

if (c >= '0' && c <= '9')
    ++ndigit[c - '0'];

This, using ASCII encoding equivalents, would actually look like this:

if (c >= 48 && c <= 57)
    ++ndigit[c - 48];

As I said before, don't write code like that. This was done just to show what was actually happening. The character literals are actually int values, and those values are participating in some math to calculate the index in the counter array.

For example suppose c was '7' (note: the character, not the integer). Then this:

if (c >= '0' && c <= '9')
    ++ndigit[c - '0'];

is equivalent to:

if ('7' >= '0' && '7' <= '9')
    ++ndigit['7' - '0'];

which ultimately is equivalent to:

if (55 >= 48 && 55 <= 57)
    ++ndigit[55 - 48];

and since 55-48 is simply 7, that means the final indexing is:

    ++ndigit[7];

This is done for all digit characters in the input file, and in the end, the accumulation of how many times each digit character is found is presented.

I hope that clears it up. The rest I assume you understand.

Answer 2

That C code, analizes the input text and give you some information about that input.

digits =  0 2 2 2 1 1 1 1 1 1

Is the number of 0s, 1s, 2, 3, 4, 5, 6, 7, 8, and 9.

white space = 7

Is the number of spaces, line breaks, and tabulations.

other = 29

Is the number of characters that aren't digits or spaces.

Answer 3

The code is printing the number of occurrences of each number (0 to 9) in the input.

In your input:

sample text for stackoverflow
try 123
123456789

There are:
0 occurrences of number '0'
2 occurrences of number '1'
2 occurrences of number '2'
2 occurrences of number '3'
1 occurrence of number '4'
1 occurrence of number '5'
1 occurrence of number '6'
1 occurrence of number '7'
1 occurrence of number '8'
1 occurrence of number '9'

Hence the output is digits = 0 2 2 2 1 1 1 1 1 1

white space = 7 gives the number of spaces or line breaks.
other = 29 gives the number of characters that are neither digits, nor spaces.

As for ++ndigit[c - '0'];, we have used [c-'0'] to get an equivalent [c-48] by substituting ASCII value, but without using the magic number '48' (using magic numbers is usually a bad practice).

Answer 4

The ndigits array records the frequency of each digit entered. You didn't key in 0, therefore the count of digit 0 is zero and you keyed in digit 1 twice, therefore the count of digit 1 at position 1 is 2 and so on and so forth.

The nwhite variable holds the count of space, tab and enter (return) key

The nother variable holds the count of every thing else.

Answer 5

0 2 2 2 1 1 1 1 1 1 are amounts of each digit (0 zeros, 2 ones, 2 twos, 2 threes, 1 four, 1 five...)
29 - number of symbols which are neither digits nor whitespaces.

c - '0' is a common way to get an integer digit from a char (ASCII code). For example:

'0' - '0' == 0
'1' - '0' == 1
'9' - '0' == 9

How is the C code functioning?

5 个答案: