使用Javacc处理COBOL语法中的注释和行/列号

时间:2013-03-04 15:18:14

标签: java parsing javacc

我正在使用 JavaCC 开发COBOL Parser。 COBOL文件通常将第1列到第6列作为行/列号。如果行/列号不存在,则会有空格。

我需要知道如何在COBOL文件中处理注释和序列区域,并仅解析主区域。

我尝试过很多表达式但没有一个正常工作。我创建了一个特殊的令牌,它将检查新行,然后检查六次出现的空格或除空格和回车之外的任何字符,之后第七个字符将"*"用于评论," "用于法线。< / p>

我正在使用此处提供的Cobol.jj文件http://java.net/downloads/javacc/contrib/grammars/cobol.jj

有人可以建议我使用什么语法?

我的语法文件样本:

    PARSER_END(CblParser)

////////////////////////////////////////////////////////////////////////////////
// Lexical structure
////////////////////////////////////////////////////////////////////////////////

SPECIAL_TOKEN :
{
  < EOL: "\n" > : LINE_START 
| < SPACECHAR: ( " " | "\t" | "\f" | ";" | "\r" )+ >
}

SPECIAL_TOKEN :
{
  < COMMENT: ( ~["\n","\r"," "] ~["\n","\r"," "] ~["\n","\r"," "] ~["\n","\r"," "] ~["\n","\r"," "] ~["\n","\r"," "] ) ( "*" | "|" ) (~["\n","\r"])* >
| < PREPROC_COMMENT: "*|" (~["\n","\r"])* >
| < SPACE_SEPARATOR : ( <SPACECHAR> | <EOL> )+ >
| < COMMA_SEPARATOR : "," <SPACE_SEPARATOR> >
}

<LINE_START> SKIP :
{
 < ((~[])(~[])(~[])(~[])(~[])(~[])) (" ") >
}

1 个答案:

答案 0 :(得分:1)

由于解析器在行的开头处开始,因此应使用DEFAULT状态来表示行的开头。我会做以下[未经测试的代码]。

// At the start of each line, the first 6 characters are ignored and the 7th is used
// to determine whether this is a code line or a comment line.
// (Continuation lines are handled elsewhere.)
// If there are fewer than 7 characters on the line, it is ignored.
// Note that there will be a TokenManagerError if a line has at least 7 characters and
// the 7th character is other than a "*", a "/", or a space.
<DEFAULT> SKIP :
{
   < (~[]){0,6} ("\n" | "\r" | "\r\n") > :DEFAULT
|
   < (~[]){6} (" ") > :CODE
|
   < (~[]){6} ("*"|"/")  :COMMENT
}

<COMMENT> SKIP :
{   // At the end of a comment line, return to the DEFAULT state.
    < "\n" | "\r" | "\r\n" > : DEFAULT
|   // All non-end-of-line characters on a comment line are ignored.
    < ~["\n","\r"] > : COMMENT
}
<CODE> SKIP :
{   // At the end of a code line, return to the DEFAULT state.
    < "\n" | "\r" | "\r\n" > : DEFAULT
|   // White space is skipped, as are semicolons.
    < ( " " | "\t" | "\f" | ";" )+ >
}
<CODE> TOKEN :
{  
    < ACCEPT: "accept" >
|  
    ... // all rules for tokens should be in the CODE state.
}