我关注this tutorial并成功复制了它的行为,除了我使用的是Antlr 4.7而不是教程使用的4.5。
我正在尝试为费用跟踪器构建DSL。
想知道每个元素是否都有属性?
E.g。这就是现在的样子
这是https://github.com/simkimsia/learn-antlr-web-js/blob/master/todo.g4
中所见的todo.g4的代码grammar todo;
elements
: (element|emptyLine)* EOF
;
element
: '*' ( ' ' | '\t' )* CONTENT NL+
;
emptyLine
: NL
;
NL
: '\r' | '\n'
;
CONTENT
: [a-zA-Z0-9_][a-zA-Z0-9_ \t]*
;
意思是说元素也会有2个属性,比如金额和收款人。为了简单起见,我将使用相同的句子结构,以便更容易地进行解析。
格式为pay [payee] [amount]
示例为pay Acme Corp 123,789.45
所以收款人是Acme Corp,金额是12378945,以整数表示,表示以美分为单位的金额
另一个例子是pay Banana Inc 700
所以收款人是Banana Inc,金额为70000,以整数表示,表示以美分为单位的金额
我猜我需要更改todo.g4然后重新生成解析器。
元素可以有其他属性吗? 如果是这样,我该如何开始?
更新
这是我最新的尝试排名最新的更新:
我刚想出如何使用grun和testRig。谢谢@Raven的提示。
最新尝试:我的最新费用.4(与之前的尝试仅有差异是付款的正则表达式)
grammar expense;
payments: (payment NL)* ;
payment: PAY receiver amount=NUMBER ;
receiver: surname=ID (lastname=ID)? ;
PAY: 'pay' ;
NUMBER: ([0-9]+(','[0-9]+)*)('.'[0-9]*)?;
ID: [a-zA-Z0-9_]+ ;
NL: '\n' | '\r\n' ;
WS: [\t ]+ -> skip ;
早先的尝试:这是我的费用.4
grammar expense;
payments: (payment NL)* ;
payment: PAY receiver amount=NUMBER ;
receiver: surname=ID (lastname=ID)? ;
PAY: 'pay' ;
NUMBER: [0-9]+ (',' [0-9]+)+ ('.' [0-9]+)? ;
ID: [a-zA-Z0-9_]+ ;
NL: '\n' | '\r\n' ;
WS: [\t ]+ -> skip ;
答案 0 :(得分:2)
情况发生在10月24日。2017年19:00 UTC + 1。
你的语法很完美。我用Java做了一个完整的测试。
档案Expense.g4
:
grammar Expense;
payments
@init {System.out.println("Expense last update 1853");}
: (payment NL)*
;
payment
: PAY receiver amount=NUMBER
{System.out.println("Payement found " + $amount.text + " to " + $receiver.text);}
;
receiver
: surname=ID (lastname=ID)?
;
PAY : 'pay' ;
NUMBER : ([0-9]+(','[0-9]+)*)('.'[0-9]*)? ;
ID : [a-zA-Z0-9_]+ ;
NL : '\n' | '\r\n' ;
WS : [\t ]+ -> channel(HIDDEN) ; // keep the spaces (witout spaces ==> paydeltaco98)
档案ExpenseMyListener.java
:
public class ExpenseMyListener extends ExpenseBaseListener {
ExpenseParser parser;
public ExpenseMyListener(ExpenseParser parser) { this.parser = parser; }
public void exitPayments(ExpenseParser.PaymentsContext ctx) {
System.out.println(">>> in ExpenseMyListener for paymentsss");
System.out.println(">>> there are " + ctx.payment().size() + " elements in the list of payments");
for (int i = 0; i < ctx.payment().size(); i++) {
System.out.println(ctx.payment(i).getText());
}
}
public void exitPayment(ExpenseParser.PaymentContext ctx) {
System.out.println(">>> in ExpenseMyListener for payment");
System.out.println(parser.getTokenStream().getText(ctx));
}
}
档案test_expense.java
:
import org.antlr.v4.runtime.ANTLRFileStream;
import org.antlr.v4.runtime.ANTLRInputStream;
import org.antlr.v4.runtime.CommonTokenStream;
import org.antlr.v4.runtime.ParserRuleContext;
import org.antlr.v4.runtime.tree.*;
import java.io.FileInputStream;
import java.io.InputStream;
import java.io.IOException;
public class test_expense {
public static void main(String[] args) throws IOException {
ANTLRInputStream input = new ANTLRFileStream(args[0]);
ExpenseLexer lexer = new ExpenseLexer(input);
CommonTokenStream tokens = new CommonTokenStream(lexer);
ExpenseParser parser = new ExpenseParser(tokens);
ParseTree tree = parser.payments();
System.out.println("---parsing ended");
ParseTreeWalker walker = new ParseTreeWalker();
ExpenseMyListener my_listener = new ExpenseMyListener(parser);
System.out.println(">>>> about to walk");
walker.walk(my_listener, tree);
}
}
输入文件top.text
:
pay Acme Corp 123,456
pay Banana Inc 456789.00
pay charlie pte 123,456.89
pay delta co 98
执行:
$ export CLASSPATH=".:/usr/local/lib/antlr-4.6-complete.jar"
$ alias
alias a4='java -jar /usr/local/lib/antlr-4.6-complete.jar'
alias grun='java org.antlr.v4.gui.TestRig'
$ a4 Expense.g4
$ javac Ex*.java
$ javac test_expense.java
$ grun Expense payments -tokens -diagnostics top.text
[@0,0:2='pay',<'pay'>,1:0]
[@1,3:3=' ',<WS>,channel=1,1:3]
[@2,4:7='Acme',<ID>,1:4]
[@3,8:8=' ',<WS>,channel=1,1:8]
[@4,9:12='Corp',<ID>,1:9]
...
[@32,90:89='<EOF>',<EOF>,5:0]
Expense last update 1853
Payement found 123,456 to Acme Corp
Payement found 456789.00 to Banana Inc
Payement found 123,456.89 to charlie pte
Payement found 98 to delta co
$ java test_expense top.text
Expense last update 1853
Payement found 123,456 to Acme Corp
Payement found 456789.00 to Banana Inc
Payement found 123,456.89 to charlie pte
Payement found 98 to delta co
---parsing ended
>>>> about to walk
>>> in ExpenseMyListener for payment
pay Acme Corp 123,456
>>> in ExpenseMyListener for payment
pay Banana Inc 456789.00
>>> in ExpenseMyListener for payment
pay charlie pte 123,456.89
>>> in ExpenseMyListener for payment
pay delta co 98
>>> in ExpenseMyListener for paymentsss
>>> there are 4 elements in the list of payments
payAcmeCorp123,456
payBananaInc456789.00
paycharliepte123,456.89
paydeltaco98
答案 1 :(得分:1)
我不完全确定你想要什么,但是对于提供的例子,这个语法应该可以完成这项工作:
payments: (payment NL)* ;
payment: PAY receiver amount=NUMBER ;
receiver: surname=ID (lastname=ID)? ;
PAY: 'pay' ;
NUMBER: [0-9]+ (',' [0-9]+)+ ('.' [0-9]+)? ;
ID: [a-zA-Z0-9_]+ ;
NL: '\n' | '\r\n' ;
WS: [\t ]+ -> skip ;
如果这是您要求的,我会在需要时添加更多解释......
答案 2 :(得分:1)
我猜我需要更改todo.g4然后重新生成 解析器。
当然每次更改后都会重新生成。对我而言:
$ a4 Question.g4
$ javac Q*.java
$ grun Question elements -tokens -diagnostics t.text
,其中
$ alias
alias a4='java -jar /usr/local/lib/antlr-4.6-complete.jar'
alias grun='java org.antlr.v4.gui.TestRig'
您描述的具体内容越多,您就越容易出现歧义问题。例如,您有两个规则:
payment : 'pay' [payee] [amount]
free_text : ... any character ...
考虑以下内容:
* pay Federico Tomassetti 10 € for the tutorial
* pay Federico Tomassetti 10
含糊不清,可以与两个规则匹配,但最终会被解析为自由文本,因为€ for the tutorial
不满足payment
。
如果稍后您更改payment
规则以接受金额后的更多信息:
payment : 'pay' [payee] [amount] payment_info
以上内容将由payment
匹配(如果不明确,ANTLR会选择第一条规则)。好消息是ANTLR 4非常强大,可以消除歧义,如有必要,它会读取整个文件。
对于含糊不清的令牌和优先规则,请阅读过去三周的帖子,已经说了很多。
将Raven的语法与你的语法混合,这是一种可能的解决方案:
档案Question.g4
grammar Question;
elements
@init {System.out.println("Question last update 1432");}
: ( element | emptyLine )* EOF
;
element
: '*' content NL
;
content
: payment //{System.out.println("Payement found " + $payment.text);}
| free_text {System.out.println("Free text found " + $free_text.text);}
;
payment
: PAY receiver amount=NUMBER
{System.out.println("Payement found " + $amount.text + " to " + $receiver.text);}
;
receiver
: surname=WORD ( lastname=WORD )?
;
free_text
: ( WORD | PAY | NUMBER )+
;
emptyLine
: NL
;
PAY : 'pay' ;
WORD : LETTER ( LETTER | DIGIT | '_' )* ;
NUMBER : DIGIT+ ( ',' DIGIT+ )? ( '.' DIGIT+ )? ;
NL : [\r\n]
| '\r\n'
;
//WS : [ \t]+ -> skip ; // $payment.text => payAcmeCorp123,789.45
WS : [ \t]+ -> channel(HIDDEN) ; // spaces are needed to nicely display $payment.text
fragment DIGIT : [0-9] ;
fragment LETTER : [a-zA-Z] ;
档案t.text
* play with ANTLR 4
* write a tutorial
* pay Acme Corp 123,789.45
* pay Banana Inc 700
* pay Federico Tomassetti 10 € for the tutorial
执行:
$ grun Question elements -tokens -diagnostics t.text
line 5:29 token recognition error at: '€'
[@0,0:0='*',<'*'>,1:0]
[@1,1:1=' ',<WS>,channel=1,1:1]
[@2,2:5='play',<WORD>,1:2]
[@3,6:6=' ',<WS>,channel=1,1:6]
[@4,7:10='with',<WORD>,1:7]
[@5,11:11=' ',<WS>,channel=1,1:11]
[@6,12:16='ANTLR',<WORD>,1:12]
[@7,17:17=' ',<WS>,channel=1,1:17]
[@8,18:18='4',<NUMBER>,1:18]
[@9,19:19='\n',<NL>,1:19]
[@10,20:20='*',<'*'>,2:0]
[@11,21:21=' ',<WS>,channel=1,2:1]
[@12,22:26='write',<WORD>,2:2]
[@13,27:27=' ',<WS>,channel=1,2:7]
[@14,28:28='a',<WORD>,2:8]
[@15,29:29=' ',<WS>,channel=1,2:9]
[@16,30:37='tutorial',<WORD>,2:10]
[@17,38:38='\n',<NL>,2:18]
...
[@56,136:135='<EOF>',<EOF>,7:0]
Question last update 1432
Free text found play with ANTLR 4
Free text found write a tutorial
line 3:26 reportAttemptingFullContext d=2 (content), input='pay Acme Corp 123,789.45
'
...
Payement found 700 to Banana Inc
Free text found pay Federico Tomassetti 10 for the tutorial
如您所见,€符号无法识别。您可能需要一个类似于CONTENT
here的FIELDTEXT
规则,然后您就会遇到麻烦......
Federico的Mega tutorial是一个好的开始。有关详细信息,请参阅The Definitive ANTLR 4 Reference或在线文档from www.antlr.org。