使用else-if的C代码解析器

时间:2016-10-14 07:31:04

标签: c regex parsing

对于以下question

  

练习12347 - 编写一个程序,该程序将从其标准输入读取C程序的源代码,并打印出以下程序统计数据中的所有已加星标的项目(全部为整数)。 (请注意本规范末尾的制表符的注释。)

     

打印出以下值:

  Lines:
  *  The total number of lines
  *  The total number of blank lines
    (Any lines consisting entirely of white space should be
    considered as blank lines.)
    * The percentage of blank lines (100 * blank_lines / lines)

  Characters:
     *  The total number of characters after tab expansion
     *  The total number of spaces after tab expansion
     *  The total number of leading spaces after tab expansion
     (These are the spaces at the start of a line, before any visible
    character; ignore them if there are no visible characters.)
    * The average number of characters per line
    * characters per line ignoring leading spaces
    * leading spaces per line
    * spaces per line ignoring leading spaces

  Comments:
    *  The total number of comments in the program
    *  The total number of characters in the comments in the program
        excluding the "/*" and "*/" thenselves
    * The percentage of number of comments to total lines
    * The percentage of characters in comments to characters

Identifiers:
   * We are concerned with all the occurrences of "identifiers" in the
     program where each part of the text starting with a letter,
     and continuing with letter, digits and underscores is considered
      to be an identifier, provided that it is not in a comment, or in a string, or within primes.
    Note that "abc\"def"
    the internal escaped quote does not close the string.
    Also, the representation of the escape character is '\\'
      and of prime is '\''
  Do not attempt to exclude the fixed words of the language,
  treat them as identifiers. Print
*  The total number of identifier occurrences.
*  The total number of characters in them.
*   The average identifier length.

  Indenting:
   *  The total number of times either of the following occurs:
      a line containing a "}" is more indented than the preceding line
      a line is preceded by a line containing a "{" and is less
      indented than it.
      The "{" and "}" must be ignored if in a comment or string or
      primes, or if the other line involved is entirely comment.
   * A single count of the sum of both types of error is required.
  NOTE: All tab characters ('') on input should be interpreted as multiple spaces using the rule:
    "move to the next modulo 8 column"
     where the first column is numbered column 0.

  col before tab | col after tab
        ---------------+--------------
                0      |      8
                1      |      8
                7      |      8
                8      |     16
                9      |     16
               15      |     16
               16      |     24
    To read input a character at a time the skeleton has code incorporated to read a line at a time for you using
   char ch;
   ch = getchar();
   Which will deliver each character exactly as read. The "getline" function then puts the line just read in the global array of characters "linec", null terminated, and delivers the length of the line, or a negative value if end of data has been encountered.
   You can then look at the characters just read with (for example)

   switch( linec[0] ) {
    case ' ': /* space ..... */
            break;
    case '\t': /* tab character .... */
            break;
    case '\n': /* newline ... */
            break;
    ....
    } /* end switch */
  End of data is indicated by scanf NOT delivering the value 1.

    Your output should be in the following style:

   Total lines                     126
    Total blank lines               3
    Total characters                3897
    Total spaces                    1844
    Total leading spaces            1180
    Total comments                  7
    Total chars in comments         234
    Total number of identifiers     132
    Total length of identifiers     606
    Total indenting errors          2
You may gather that the above program (together with the unstarred items) forms the basis of part of your marking system! Do the easy bits first, and leave it at that if some aspects worry you. Come back to me if you think my solution (or the specification) is wrong! That is quite possible!

这是我不完整的解决方案:

#include<stdio.h>

typedef int bool;
#define true 1;
#define false 0;

int main(void){
    int ch;
    int numOfLines = 0;
    int numOfBlankLines = 0;
    int numOfCharAfterTab = 0;
    int numOfSpacesAfterTab = 0;

    bool isTabNow = false;
    bool isNewLine = false;
    bool isSpace = false;
    bool isChar = false;

    while(ch = getchar(), ch >= 0){

        if(ch == '\t')
        {
            isTabNow = true;
        }
        else if(ch == ' ')
        {
            if(isTabNow) 
            { 
                numOfSpacesAfterTab++; continue; 
            }  /* 4. Number of spaces after tab expansion*/
        }
        else if(ch == '\n')
        {
            isTabNow = false;
            numOfLines++;                         /* 1. Total number of Lines*/
            if(!isChar) 
            {
                numOfBlankLines++;
            }        /* 2. Total number of blank lines*/
        }
        else if((ch >= 'a' && ch <= 'z') || (ch >= 0 && ch <= 9)||
                (ch >= '!' && ch <= '/') || (ch >= ':' && ch <= '@')||
                /* Referred ascii chart and compared 'ch' with ascii values of printable characters*/
                (ch >= 'A' && ch <= 'Z') || (ch >= '[' && ch <= '`') ||
                (ch >= '{' && ch <= '~'))
        {
            isChar=true;
            if(isTabNow){                          /* 3. Total number of characters after tab expansion*/
                numOfCharAfterTab++;
            }
        }
        else
        {
            printf("\n Invaid character %c", ch);
        }

    }// end while
}

问题:

以上问题建议使用switch-case,但与else-if语法不同,无法管理值范围。

根据上述解决方案,我无法解决扩展标签后的总字符数。我该如何解决这个问题?

2 个答案:

答案 0 :(得分:1)

要放置开关,有一个中途和全程。中途是switch一个字符选择的位置,然后在默认中包含if其余的测试

switch( ch ) {
  case '\t' : ...
  case '\n' : ...
  case ' ':  ...
  default: 
      if (isdigit(ch)) ....
      else ....
}

在全文中,我们定义了一个包含所有相关字符的表,并指出它是什么。 enum 用于提高可读性。

在处理文件之前,表格已构建一次。

制作表格(NONE为清晰度设置为0)

typedef enum { NONE=0, CR, TAB, LETTER, DIGIT, QUOTE, DQUOTE /* ... */ } Kind;

Kind table[256] = { NONE }; // default is invalid

int c;
table[ '\t' ] = TAB;
table[ '\n' ] = CR;
for(c='a' ; c<='z' ; c++) table[ c ] = LETTER;
/* ... */

然后在程序中

switch(table[ ch ]) {
   case TAB: ....
   case LETTER: ....
   case ...
   ...
}

答案 1 :(得分:0)

您需要跟踪“有效列”,即编辑器显示下一个字符的列。

规则:

  1. 有效列从0开始,除了制表符和换行符之外每个字符增加1。

  2. 对于换行符,有效列将重置为0.

  3. 对于选项卡,有效列增加1,然后“四舍五入”到最接近8的数字。中间字符被视为添加空格字符。

  4. 您需要跟踪由于“向上舍入最接近的数字8”(=制表符扩展)规则而添加的空格数。