ANTLR4接受其他令牌是否有效?

时间:2014-03-10 16:08:49

标签: java parsing antlr4

我正在构建一个小规则语言来测试并习惯ANTLR。我正在使用ANTLR V4,我的语法分割如下:

Lexer.g4

lexer grammar Lexer;

/*------------------------------------------------------------------
 * LEXER RULES - GENERIC KEYWORDS
 *------------------------------------------------------------------*/
NOT
    : 'not'
    ;

NULL
    : 'null'
    ;

AND
    : 'and'
    | '&'
    ;

/*------------------------------------------------------------------
 * LEXER RULES - PATTERN MATCHING
 *------------------------------------------------------------------*/
DELIM
    : [\|\\/:,&@+><^]
    ;

WS 
    : [ \t\r\n]+ -> skip 
    ;

VALUE 
    : SQUOTE TEXT SQUOTE
    ;

fragment SQUOTE
    : '\'' 
    ;

fragment TEXT 
    : ( 'a'..'z' 
      | 'A'..'Z'
      | '0'..'9'
      | '-'
      )+ ;

Attribute.g4

grammar Attribute;

/*------------------------------------------------------------------
 * Semantic Predicate
 *
 * Attributes are capitalised words that may have spaces.  They're 
 * loaded from the database and and set in the glue code so that
 * they can be cross checked here.  If the grammar passed in sees
 * an attribute it will pass so long as the attribute is in the 
 * database, otherwise the grammar will fail to parse.
 *------------------------------------------------------------------*/  
attr
    : a=ATTR {attributes.contains($a.text)}?
    ;

ATTR
    : ([A-Z][a-zA-Z0-9/]+([ ][A-Z][a-zA-Z0-9/]+)?)
    ;

ReplaceInWith.g4

grammar ReplaceInWith;

/*------------------------------------------------------------------
 * REPLACE IN WITH PARSER RULES
 *------------------------------------------------------------------*/
replace_in_with
    : rep in with {row.put($in.value    , $in.value.replace($rep.value, $with.value));}
    | repAtt with {row.put($repAtt.value, $with.value);}
    ;

rep returns[String value]
    : REPLACE v=VALUE {$value = trimQuotes($v.text);}
    ;

repAtt returns[String value]
    : REPLACE a=attr  {$value = $a.text;}
    ;

in returns[String value]
    : IN a=attr {$value = $a.text;}
    ;

with returns[String value]
    : WITH v=VALUE {$value = trimQuotes($v.text);}
    ;

/*------------------------------------------------------------------
 * LEXER RULES - KEYWORDS
 *------------------------------------------------------------------*/
REPLACE
    : 'rep'
    | 'replace'
    ;

IN
    : 'in'
    ;

WITH
    : 'with'
    ;

Parser.g4

grammar Parser;

/*------------------------------------------------------------------
 * IMPORTED RULES
 *------------------------------------------------------------------*/
 import //Essential imports
    Attribute,
    GlueCode,
    Lexer,

    //Actual Rules
    ReplaceInWith,

/*------------------------------------------------------------------
 * PARSER RULES
 * MUST ADD EACH TOP LEVEL RULE HERE FOR IT TO BE CALLABLE
 *------------------------------------------------------------------*/
eval
    : replace_in_with
    ;

GlueCode.g4

Java to supply static calling functionality to the grammar and to set the attributes up from the database.

ParserErrorListener.java

public class ParserErrorListener extends ParserBaseListener 
{
    /**
     * After every rule check to see if an exception was thrown, if so exit with a runtime exception to indicate a 
     * parser problem.<p>
     */
    @Override 
    public void exitEveryRule(@NotNull ParserRuleContext ctx) 
    { 
        super.exitEveryRule(ctx);

        if (ctx.exception != null)
        {
            throw new ParserRuntimeException(String.format("Error evaluating expression(s) '%s'", ctx.exception));
        } //if
    } //exitEveryRule
} //class

当我向语法提供以下内容时,它会按预期传递:

"replace 'Acme' in Name with 'acme'",
"rep 'Acme' in Name with 'acme'",
"replace 'Acme' in Name with 'ACME'",
"rep 'Acme' in Name with 'ACME'",
"replace 'e' in Name with 'i'",
"rep 'e' in Name with 'i'",

"replace '-' in Number with ' '",
"rep '-' in Number with ' '",
"replace '555' in Number with '00555'",
"rep '555' in Number with '00555'"

将NAME和NUMBER设置为语义谓词的属性。

然而,当我传入以下语句时,语法仍然通过,但我不确定它为什么匹配:

"replace any 'Acme' in Name with 'acme'",
"replaceany 'Acme' in Name with 'acme'",

NAME再次作为要与语义谓词匹配的属性传入,这部分语法在我的测试中起作用。失败的部分是“任何”部分。语法匹配替换然后得到它认为是'Acme'的下一个标记,忽略上面两个例子中的'any'部分。我在这里期待的是语法失败,在退出规则的监听器中,我添加了一个检查,它应该抛出一个运行时异常,由GlueCode捕获,表示失败。

关于如何在发生错误时让我的语法出错的任何想法?

1 个答案:

答案 0 :(得分:1)

  1. 首先,词法规则在ANTLR中始终是全局的。输入中的每个标记都将分配一个,只有一个标记类型。如果将词法分析器规则分成多个文件,那么确定令牌不明确的情况就成了维护的噩梦。一般规则是:

      

    避免import用于包含 标有fragment修饰符的规则的词法分析器语法。

  2. ATTR令牌将分配给与ATTR匹配的输入,无论attr规则中的谓词是否成功。这将阻止与ATTR规则匹配的输入被视为另一种令牌类型。您应该将语义谓词从attr规则移动到ATTR规则,以防止词法分析器为不在预定义属性集中的输入创建ATTR标记。

  3. 如果语法错误,不保证设置ParserRuleContext.exception字段。确定未发生语法错误的唯一方法是在解析后调用Parser.getNumberOfSyntaxErrors(),或添加自己的ANTLRErrorListener

  4. 您的上一个词法规则应类似于以下内容。否则,将以静默方式删除与词法分析器规则不匹配的输入序列。此规则将这些输入传递给解析器以进行处理/报告。

    ErrorChar : . ;
    
  5. 对于复杂的语法,请避免使用组合语法。相反,创建lexer grammarparser grammar语法,解析器语法使用tokenVocab选项导入标记。组合语法允许您通过在解析器规则中编写字符串文字来隐式声明词法分析器规则,这会降低大型语法的可维护性。

  6. ReplaceInWith.g4 包含许多带嵌入操作的规则。应将这些操作移动到解析完成后运行的单独侦听器,并应删除这些规则中的returns子句。这提高了语法的可移植性和可重用性。如何执行此操作的示例可以在these commits中看到,它是显示conversion of an application using ANTLR 3 to ANTLR 4的较大拉取请求的一部分。