Antlr4 - 输入

时间:2017-07-07 02:37:21

标签: go antlr4

我正在尝试创建一个简单的HOCON解析器(从现有的JSON解析器开始)。

语法定义为:

/** Taken from "The Definitive ANTLR 4 Reference" by Terence Parr */

// Derived from http://json.org
grammar HOCON;

hocon
   : value
   | pair
   ;

obj
   : object_begin pair (','? pair)* object_end
   | object_begin object_end
   ;

pair
   : STRING KV? value {fmt.Println("pairstr",$STRING.GetText())}
   | KEY KV? value {fmt.Println("pairkey",$KEY.GetText())}
   ;

array
   : array_begin value (',' value)* array_end
   | array_begin array_end
   ;

value
   : STRING {fmt.Println($STRING.GetText())}
   | REFERENCE {fmt.Println($REFERENCE.GetText())}
   | RAWSTRING {fmt.Println($RAWSTRING.GetText())}
   | NUMBER {fmt.Println($NUMBER.GetText())}
   | obj
   | array
   | 'true'
   | 'false'
   | 'null'
   ;

COMMENT
   : '#' ~( '\r' | '\n' )* -> skip
   ;

STRING
   : '"' (ESC | ~ ["\\])* '"'
   | '\'' (ESC | ~ ['\\])* '\''
   ;

RAWSTRING
   : (ESC | ALPHANUM)+
   ;

KEY
   : ( '.' | ALPHANUM | '-')+
   ;

REFERENCE
   : '${' (ALPHANUM|'.')+ '}'
   ;

fragment ESC
   : '\\' (["\\/bfnrt] | UNICODE)
   ;


fragment UNICODE
   : 'u' HEX HEX HEX HEX
   ;

fragment ALPHANUM
   : [0-9a-zA-Z]
   ;

fragment HEX
   : [0-9a-fA-F]
   ;

KV
   : [=:]
   ;

array_begin
   : '[' { fmt.Println("BEGIN [") }
   ;

array_end
   : ']' { fmt.Println("] END") }
   ;

object_begin
   : '{' { fmt.Println("OBJ {") }
   ;

object_end
   : '}' { fmt.Println("} OBJ") }
   ;

NUMBER
   : '-'? INT '.' [0-9] + EXP? | '-'? INT EXP | '-'? INT
   ;

fragment INT
   : '0' | [1-9] [0-9]*
   ;

// no leading zeros

fragment EXP
   : [Ee] [+\-]? INT
   ;

// \- since - means "range" inside [...]

WS
   : [ \t\n\r] + -> skip
   ;

错误是:

line 2:2 no viable alternative at input '{journal'
pairkey akka.persistence

提供错误的示例输入是:

akka.persistence {
  journal {
    # Absolute path to the journal plugin configuration entry used by
    # persistent actor or view by default.
    # Persistent actor or view can override `journalPluginId` method
    # in order to rely on a different journal plugin.
    plugin = ""
  }
}

但是,如果我将其更新为使用带引号的字符串:

akka.persistence {
  'journal' {
    # Absolute path to the journal plugin configuration entry used by
    # persistent actor or view by default.
    # Persistent actor or view can override `journalPluginId` method
    # in order to rely on a different journal plugin.
    'plugin' = ""
  }
}

一切都按预期工作。

看起来我错过了KEY定义中的某些内容,但我无法确切地知道究竟是什么。

测试它的Go代码是:

package main

import (
    "github.com/antlr/antlr4/runtime/Go/antlr"
    "go-hocon/parser"
)

func main() {
    is, _ := antlr.NewFileStream("test/simple1.conf")

    lex := parser.NewHOCONLexer(is)
    p := parser.NewHOCONParser(antlr.NewCommonTokenStream(lex, 0))
    p.BuildParseTrees = true
    p.Hocon()
}

1 个答案:

答案 0 :(得分:1)

您的第一个输入使 journal lex成为​​RAWSTRING

[@0,0:15='akka.persistence',<KEY>,1:0]
[@1,17:17='{',<'{'>,1:17]
[@2,22:28='journal',<RAWSTRING>,2:2]
[@3,30:30='{',<'{'>,2:10]
[@4,277:282='plugin',<RAWSTRING>,7:4]
[@5,284:284='=',<KV>,7:11]
[@6,286:287='""',<STRING>,7:13]
[@7,292:292='}',<'}'>,8:2]
[@8,295:295='}',<'}'>,9:0]
[@9,298:297='<EOF>',<EOF>,10:0]
line 2:2 no viable alternative at input '{journal'

另一方面,'journal' lexes是一个字符串,但有那些你明显不想要的单引号:

[@0,0:15='akka.persistence',<KEY>,1:0]
[@1,17:17='{',<'{'>,1:17]
[@2,22:30=''journal'',<STRING>,2:2]  <-- now it's a string implicit token
[@3,32:32='{',<'{'>,2:12]
[@4,279:284='plugin',<RAWSTRING>,7:4]
[@5,286:286='=',<KV>,7:11]
[@6,288:289='""',<STRING>,7:13]
[@7,294:294='}',<'}'>,8:2]
[@8,297:297='}',<'}'>,9:0]
[@9,300:299='<EOF>',<EOF>,10:0]
line 7:4 no viable alternative at input '{plugin'
line 8:2 mismatched input '}' expecting {'true', 'false', 'null', '[', '{', STRING, RAWSTRING, REFERENCE, KV, NUMBER}

为什么呢?因为词法分析器规则以下列方式绑定: 1.首先匹配最长输入。 2.匹配隐式令牌(如'journal') 3.如果输入匹配的长度相等,则根据词法分析器规则的顺序进行匹配。

在您的情况下,将'journal' 设为匹配作为隐式令牌,因此它似乎可以正常工作。但只是因为那些单引号,这使得它匹配上面的规则2没有引号,这两个标记被匹配为RAWSTRING,这不符合规则

pair
   : STRING KV? value //{fmt.Println("pairstr",$STRING.GetText())}

因此错误。

如何解决?好吧,我颠倒了词法规则:

RAWSTRING
   : (ESC | ALPHANUM)+
   ;

STRING
   : '"' (ESC | ~ ["\\])* '"'
   | '\'' (ESC | ~ ['\\])* '\''
   ;

并更改了pair

pair
   : RAWSTRING KV? value //{fmt.Println("pairstr",$STRING.GetText())}

现在解析得很好:

[@0,0:15='akka.persistence',<KEY>,1:0]
[@1,17:17='{',<'{'>,1:17]
[@2,22:28='journal',<RAWSTRING>,2:2]
[@3,30:30='{',<'{'>,2:10]
[@4,277:282='plugin',<RAWSTRING>,7:4]
[@5,284:284='=',<KV>,7:11]
[@6,286:287='""',<STRING>,7:13]
[@7,292:292='}',<'}'>,8:2]
[@8,295:295='}',<'}'>,9:0]
[@9,298:297='<EOF>',<EOF>,10:0]