我正在尝试创建一个简单的HOCON解析器(从现有的JSON解析器开始)。
语法定义为:
/** Taken from "The Definitive ANTLR 4 Reference" by Terence Parr */
// Derived from http://json.org
grammar HOCON;
hocon
: value
| pair
;
obj
: object_begin pair (','? pair)* object_end
| object_begin object_end
;
pair
: STRING KV? value {fmt.Println("pairstr",$STRING.GetText())}
| KEY KV? value {fmt.Println("pairkey",$KEY.GetText())}
;
array
: array_begin value (',' value)* array_end
| array_begin array_end
;
value
: STRING {fmt.Println($STRING.GetText())}
| REFERENCE {fmt.Println($REFERENCE.GetText())}
| RAWSTRING {fmt.Println($RAWSTRING.GetText())}
| NUMBER {fmt.Println($NUMBER.GetText())}
| obj
| array
| 'true'
| 'false'
| 'null'
;
COMMENT
: '#' ~( '\r' | '\n' )* -> skip
;
STRING
: '"' (ESC | ~ ["\\])* '"'
| '\'' (ESC | ~ ['\\])* '\''
;
RAWSTRING
: (ESC | ALPHANUM)+
;
KEY
: ( '.' | ALPHANUM | '-')+
;
REFERENCE
: '${' (ALPHANUM|'.')+ '}'
;
fragment ESC
: '\\' (["\\/bfnrt] | UNICODE)
;
fragment UNICODE
: 'u' HEX HEX HEX HEX
;
fragment ALPHANUM
: [0-9a-zA-Z]
;
fragment HEX
: [0-9a-fA-F]
;
KV
: [=:]
;
array_begin
: '[' { fmt.Println("BEGIN [") }
;
array_end
: ']' { fmt.Println("] END") }
;
object_begin
: '{' { fmt.Println("OBJ {") }
;
object_end
: '}' { fmt.Println("} OBJ") }
;
NUMBER
: '-'? INT '.' [0-9] + EXP? | '-'? INT EXP | '-'? INT
;
fragment INT
: '0' | [1-9] [0-9]*
;
// no leading zeros
fragment EXP
: [Ee] [+\-]? INT
;
// \- since - means "range" inside [...]
WS
: [ \t\n\r] + -> skip
;
错误是:
line 2:2 no viable alternative at input '{journal'
pairkey akka.persistence
提供错误的示例输入是:
akka.persistence {
journal {
# Absolute path to the journal plugin configuration entry used by
# persistent actor or view by default.
# Persistent actor or view can override `journalPluginId` method
# in order to rely on a different journal plugin.
plugin = ""
}
}
但是,如果我将其更新为使用带引号的字符串:
akka.persistence {
'journal' {
# Absolute path to the journal plugin configuration entry used by
# persistent actor or view by default.
# Persistent actor or view can override `journalPluginId` method
# in order to rely on a different journal plugin.
'plugin' = ""
}
}
一切都按预期工作。
看起来我错过了KEY
定义中的某些内容,但我无法确切地知道究竟是什么。
测试它的Go代码是:
package main
import (
"github.com/antlr/antlr4/runtime/Go/antlr"
"go-hocon/parser"
)
func main() {
is, _ := antlr.NewFileStream("test/simple1.conf")
lex := parser.NewHOCONLexer(is)
p := parser.NewHOCONParser(antlr.NewCommonTokenStream(lex, 0))
p.BuildParseTrees = true
p.Hocon()
}
答案 0 :(得分:1)
您的第一个输入使 journal lex成为RAWSTRING
。
[@0,0:15='akka.persistence',<KEY>,1:0]
[@1,17:17='{',<'{'>,1:17]
[@2,22:28='journal',<RAWSTRING>,2:2]
[@3,30:30='{',<'{'>,2:10]
[@4,277:282='plugin',<RAWSTRING>,7:4]
[@5,284:284='=',<KV>,7:11]
[@6,286:287='""',<STRING>,7:13]
[@7,292:292='}',<'}'>,8:2]
[@8,295:295='}',<'}'>,9:0]
[@9,298:297='<EOF>',<EOF>,10:0]
line 2:2 no viable alternative at input '{journal'
另一方面,'journal' lexes是一个字符串,但有那些你明显不想要的单引号:
[@0,0:15='akka.persistence',<KEY>,1:0]
[@1,17:17='{',<'{'>,1:17]
[@2,22:30=''journal'',<STRING>,2:2] <-- now it's a string implicit token
[@3,32:32='{',<'{'>,2:12]
[@4,279:284='plugin',<RAWSTRING>,7:4]
[@5,286:286='=',<KV>,7:11]
[@6,288:289='""',<STRING>,7:13]
[@7,294:294='}',<'}'>,8:2]
[@8,297:297='}',<'}'>,9:0]
[@9,300:299='<EOF>',<EOF>,10:0]
line 7:4 no viable alternative at input '{plugin'
line 8:2 mismatched input '}' expecting {'true', 'false', 'null', '[', '{', STRING, RAWSTRING, REFERENCE, KV, NUMBER}
为什么呢?因为词法分析器规则以下列方式绑定: 1.首先匹配最长输入。 2.匹配隐式令牌(如'journal') 3.如果输入匹配的长度相等,则根据词法分析器规则的顺序进行匹配。
在您的情况下,将'journal'
设为匹配作为隐式令牌,因此它似乎可以正常工作。但只是因为那些单引号,这使得它匹配上面的规则2没有引号,这两个标记被匹配为RAWSTRING,这不符合规则
pair
: STRING KV? value //{fmt.Println("pairstr",$STRING.GetText())}
因此错误。
如何解决?好吧,我颠倒了词法规则:
RAWSTRING
: (ESC | ALPHANUM)+
;
STRING
: '"' (ESC | ~ ["\\])* '"'
| '\'' (ESC | ~ ['\\])* '\''
;
并更改了pair
:
pair
: RAWSTRING KV? value //{fmt.Println("pairstr",$STRING.GetText())}
现在解析得很好:
[@0,0:15='akka.persistence',<KEY>,1:0]
[@1,17:17='{',<'{'>,1:17]
[@2,22:28='journal',<RAWSTRING>,2:2]
[@3,30:30='{',<'{'>,2:10]
[@4,277:282='plugin',<RAWSTRING>,7:4]
[@5,284:284='=',<KV>,7:11]
[@6,286:287='""',<STRING>,7:13]
[@7,292:292='}',<'}'>,8:2]
[@8,295:295='}',<'}'>,9:0]
[@9,298:297='<EOF>',<EOF>,10:0]