Question

晚上好，我正在为编译器课程编写扫描仪。我有一个我们必须扫描的测试文件，打印令牌所在的行，令牌是什么以及它的id号。程序正常工作，除了测试文件中的最后一个字符是。（句点）。这段时间实际上是第17行，但是我的扫描仪正在第18行输出它和EOF令牌。我试图看看一组新的眼睛是否可以看到我所缺少的东西。所有其他令牌在各自的行上输出。让我给你扫描仪本身。扫描仪还有其他一些功能，但我不相信这个问题需要它们。

void scanner(FILE *file) {
const int FINAL_STATE = -1, ERROR_STATE = -2;
char next_char;
static int line_count = 1;
string s = "";
int next_state, state = 0;

while(state != FINAL_STATE) {
  next_char = get_char(file);

  // deal with comments
  if(next_char == '&') {
     next_char = get_char(file);
     while(next_char != '\n') {
        next_char = get_char(file);
        if (next_char == '\n') {
            line_count++;
        }
     }
     continue;
  }

  // count lines
  if(next_char == '\n') {
     line_count++;
  }

  // deal with EOF
  if(next_char == EOF) {
     tk.lexeme = "EOF";
     tk.tk_num = eof_tk;
     tk.line_num = line_count;
     return;
  }
  next_state = table[state][c_val(next_char)];
  if(next_state == ERROR_STATE) {
     cout << "error on line [" << line_count << "]\n";
     exit(0);
  }

  // deal with final state         <------------I think my problem is here
  if(next_state == FINAL_STATE) {
     if(!isspace(next_char)) {
        ungetc(next_char, file);
     }

     if(table[state][1] == id_tk) {
        for(int t = 0; t < size(keywords); t++) {
           if(keywords[t].compare(s) == 0) {
              tk.lexeme = s;
              tk.tk_num = key_assign(t);
              tk.line_num = line_count;
              return;
           }

           else {
              tk.lexeme = s;
              tk.tk_num = id_tk;
              tk.line_num = line_count;
           }
        }

     if(tk.lexeme == "") {
        tk.lexeme = s;                                              
     }
     }

     else {
        tk.lexeme = s;                                      // string
        tk.tk_num = (token_type)table[state][1];            // type
        tk.line_num = line_count;                           // line
     }

     return;
  }

  state = next_state;

  if(!isspace(next_char)) {
     s += next_char;
  }
 }
}

以下是扫描仪功能的主要调用：

 while(!feof(fp)) {
        scanner(fp);
        cout << "Line: " << tk.line_num << " Token: " << tk.lexeme << " Instance: " << tk.tk_num << endl;
    }

如果需要更多代码，我很乐意编辑这篇文章，但我不想用代码重载这篇文章。最后但并非最不重要的是其格式的测试文件：

& First list of all separetd by spaces to make sure nothing is missing

qwerty uiop asdfg hjkl zxcv bnm a12345 a67890 a_ a_b abcdefghij

Start Stop Then If Iff While Var Int Float Do Read Write Void Return Dummy Program

= == < > !  +  -  *  / %  =< =>

. (  ) , { } ; [ ] :

12345 67890 001 0123456789

& now some tokens without space separators

Start_ Start.Stop Start+Stop Then=If If==Iff WhileInt start stop

x=a x==a x<=1 x>=2 x,y(z){x;y:u}[1,2,3]. <-------- This period

另外，这是程序的输出，请注意，这只是最后几行。

Line: 17 Token: y Instance: 1
Line: 17 Token: : Instance: 10
Line: 17 Token: u Instance: 1
Line: 17 Token: } Instance: 22
Line: 17 Token: [ Instance: 24
Line: 17 Token: 1 Instance: 2
Line: 17 Token: , Instance: 20
Line: 17 Token: 2 Instance: 2
Line: 17 Token: , Instance: 20
Line: 17 Token: 3 Instance: 2
Line: 17 Token: ] Instance: 25
Line: 18 Token: . Instance: 17        <---------This last token should be on 17
Line: 18 Token: EOF Instance: 0

谢谢大家好好看看。我很感激。

Answer 1

在打印结果之前，您的扫描仪看起来正在递增行号，因为.之后的next_char是\n（大多数文本编辑器在文件末尾输入隐藏的换行符）line_count过早增加？

我会尝试从文件中删除最后一个\n，看看是否会更改结果

Answer 2

@diclophis很好地解释了你的一个问题。

（虽然get_char()未显示，但假设它与getchar()类似。）

错误的EOF测试

if(next_char == EOF) {错了。 next_char使用类型char，EOF类型为int。您可以读取一个与EOF具有相同8位模式的字节，并且_不是EOF并退出错误的字节。使用int next_char修复此问题，并确保get_char()返回getchar()。

2.potential无限循环

如果'&'是文件中的最后一个字节，则不会退出此循环。

if(next_char == '&') {
  next_char = get_char(file);
  while(next_char != '\n') {
    ...
    }
 }

3.错误eof()测试。如果在尝试读取超出最后一个字节的之后文件没有更多数据，则返回true。

while(!feof(fp)) {

推荐一个惯用的

int next_char; while((next_char = get_char()) != EOF) { ...

扫描的测试文件的最后一个字符是在线输出它实际上没有打开

2 个答案: