我正在使用Clasp
为ANTLR 4
的输出编写解析器。典型输出如下:
clasp version 3.0.3
Reading from stdin
Solving...
Answer: 1
bird(a) bird(b) bird(c) penguin(d) bird(d)
Optimization: 7 0
Answer: 2
bird(a) bird(b) bird(c) penguin(d) bird(d) flies_abd(b) flies(b)
Optimization: 6 5
Answer: 3
bird(a) bird(b) bird(c) penguin(d) bird(d) flies_abd(c) flies(c)
Optimization: 2 5
Answer: 4
bird(a) bird(b) bird(c) penguin(d) bird(d) flies_abd(a) flies_abd(c) flies(a) flies(c)
Optimization: 1 10
Answer: 5
bird(a) bird(b) bird(c) penguin(d) bird(d) flies_abd(a) flies_abd(b) flies_abd(c) flies(a) flies(b) flies(c)
Optimization: 0 15
OPTIMUM FOUND
Models : 5
Optimum : yes
Optimization : 0 15
Calls : 1
Time : 0.002s (Solving: 0.00s 1st Model: 0.00s Unsat: 0.00s)
CPU Time : 0.000s
我必须检查clasp
是否为版本3
所以我正在编写如下语法:
/**
* Define a grammar for Clasp 3's output.
*/
grammar Output;
@header {package ac.bristol.clasp.parser;}
output:
version source solving answer* result separation statistics NEWLINE* EOF;
version: 'clasp version 3.' INT '.' INT NEWLINE;
source: 'Reading from stdin' NEWLINE # sourceSTDIN
| 'Reading from ' path NEWLINE # sourceFile;
path:
DRIVE? folder ( BSLASH folder )* filename # pathWindows
| FSLASH? folder ( FSLASH folder )* filename # pathNIX;
folder:
LETTER+ # genericFolder
| DOTDOT # parentFolder
| DOT # currentFolder;
solving: 'Solving...' NEWLINE;
filename:
LETTER+ extension?;
extension:
DOT LETTER*;
answer: 'Answer: ' INT NEWLINE //
model? NEWLINE //
'Optimization: ' INT ( SPACE INT )* NEWLINE;
model:
fact ( SPACE fact )*;
fact:
groundPredicate;
groundTermList:
groundTerm ( COMMA groundTerm )*;
groundTerm:
groundCompound | STRING | number | atom; // literal?
groundCompound:
groundPredicate
| groundExpression;
groundPredicate:
IDENTIFIER ( LROUND groundTermList RROUND )?;
groundExpression:
groundBits AND groundBits
| groundBits OR groundBits
| groundBits XOR groundBits;
groundBits:
groundCompare GT groundCompare
| groundCompare GE groundCompare
| groundCompare LT groundCompare
| groundCompare LE groundCompare;
groundCompare:
groundItem EQ groundItem
| groundItem NE groundItem;
groundItem:
groundFactor PLUS groundFactor
| groundFactor MINUS groundFactor;
groundFactor:
groundUnary TIMES groundUnary
| groundUnary DIVIDE groundUnary
| groundUnary MOD groundUnary;
groundUnary:
TILDE groundTerm
| MINUS groundTerm;
atom:
IDENTIFIER
| QUOTED;
number:
INT
| FLOAT;
//------------------------------------------------------------------------------
result: 'OPTIMUM FOUND' NEWLINE
| 'SATISFIABLE' NEWLINE
| 'UNKNOWN' NEWLINE;
separation:
NEWLINE;
statistics:
models optimum? optimization calls time cputime;
models: 'Models : ' INT SPACE* NEWLINE;
optimum: ' Optimum : yes' NEWLINE
| ' Optimum : no' NEWLINE;
optimization: 'Optimization : ' INT ( SPACE INT )* NEWLINE;
calls: 'Calls : ' INT NEWLINE;
time: 'Time : ' FLOAT 's (Solving: ' FLOAT 's 1st Model: ' FLOAT 's Unsat: ' FLOAT 's)' NEWLINE;
cputime: 'CPU Time : ' FLOAT 's';
//------------------------------------------------------------------------------
AND: '&';
BSLASH: '\\';
COLON: ':';
COMMA: ',';
DIVIDE: '/';
DOT: '.';
DOTDOT: '..';
EQ: '==';
FSLASH: '/';
GE: '>=';
GT: '>';
LE: '<=';
LROUND: '(';
LT: '<';
MINUS: '-';
MOD: '%';
NE: '!=';
OR: '?';
PLUS: '+';
RROUND: ')';
SEMICOLON: ';';
SPACE: ' ';
TILDE: '~';
TIMES: '*';
XOR: '^';
DRIVE: ( LOWER | UPPER ) COLON BSLASH?;
IDENTIFIER: LOWER FOLLOW*;
INT: DIGIT+;
FLOAT: DIGIT+ DOT DIGIT+;
NEWLINE: '\r'? '\n';
QUOTED: '\'' ( ~[\'\\] | ESCAPE )+? '\'';
STRING: '"' ( ~["\\] | ESCAPE )+? '"';
fragment DIGIT: [0] | NONZERO;
fragment ESCAPE: '\\' [btnr"\\] | '\\' [0-3]? [0-7]? [0-7] | '\\' 'u' [0-9a-fA-F] [0-9a-fA-F] [0-9a-fA-F] [0-9a-fA-F];
fragment FOLLOW: LOWER | UPPER | DIGIT | UNDERSCORE;
fragment LETTER: LOWER | UPPER | DIGIT | SPACE;
fragment LOWER: [a-z];
fragment NONZERO: [1-9];
fragment UNDERSCORE: [_];
fragment UPPER: [A-Z];
请注意,skip
输入流的某些部分没有规则,因为我想检查每个字符。
另请注意,我INT
的终端规则为FLOAT
,INT
为FLOAT
,FLOAT
之前定义了version: 'clasp version 3.' INT '.' INT NEWLINE;
,clasp
的定义与Prolog相同。
解析上述示例第一行的规则如下:
FLOAT
因为我必须检查所使用的line 1:16 mismatched input '0.3' expecting INT
主要版本号是否为3,而不是使用读取次要版本号,点,内部版本号和换行符的其余行(没有空格或任何地方)。
不幸的是,我收到以下警告消息,这让我觉得ANTLR将次要版本号,点和内部版本号识别为ANTLR
:
{{1}}
请你解释一下发生了什么事? 我是否应该做一些我不应该做的事情 或者是{{1}}正在应用不需要的优化吗?
答案 0 :(得分:0)
ANTLR将您的输入分解为令牌,并且仅在解析令牌之后。您在解析器规则中使用'clasp version 3.'
隐式定义了与该文本字符串匹配的匿名标记。该标记后面的文本以0.0
开头,与{float}匹配。词法分析器不知道解析器在那时将处于version
规则中;它只选择从当前位置开始的最长令牌,0.0
作为FLOAT
长于0
作为INT
。我推荐以下内容:
将语法分为parser grammar OutputParser;
和lexer grammar OutputLexer;
在解析器语法中,使用tokenVocab
选项指示哪个词法分析器定义了您的标记。这种分离将迫使您为语法正在使用的所有内容定义真实的标记。
options {
tokenVocab = OutputLexer;
}
使用FLOAT
代替INT '.' INT
,或创建新代币来表示版本:
VERSION
: DIGIT+ DOT DIGIT+ DOT DIGIT+
;