我最近开始使用ANTLR为插值字符串生成一个简单的解析器。下面是一些输入字符串示例(每行一个):
Hello {User.Name}!
Welcome on Planet {GetPlanetName(" A stupid string param :-} ")}
Plain String without an interpolated expression
String with escaped {{ brackets }}
决定某事是字符串(plainString)还是表达式(expressionString)的语法如下:
grammar T;
patternString: (plainString | expressionString)+
;
plainString: (CBO_ESCAPESEQUENCE | CBC_ESCAPESEQUENCE | PLAINSTRINGLITERAL)+
;
expressionString: CBO expression CBC | CURLYBRACKETS_EMPTY
;
expression: expressionSegment+
;
expressionSegment: ~('"' | '\'' | '{' | '(' | '[' | '}' | ')' | ']' | CBO_ESCAPESEQUENCE | CBC_ESCAPESEQUENCE)+
| '(' expressionSegment+ ')' | '(' WS ')' | '()'
| '[' expressionSegment+ ']' | '[' WS ']' | '[]'
| '{' expressionSegment+ '}' | CURLYBRACKETS_EMPTY
| stringLiteral
| charLiteral
;
stringLiteral: '"' (~('"') | '\\"')+ '"'
| '""'
;
charLiteral: '\'' (~('\'') | '\\\'')+ '\''
;
fragment WS: (' ' | '\r' | '\n' | '\t')+;
PLAINSTRINGLITERAL: ~('{' | '}');
CURLYBRACKETS_EMPTY: (CBO WS CBC | CBO CBC);
CBO: '{';
CBC: '}';
fragment CBO_ESCAPESEQUENCE: '{{';
fragment CBC_ESCAPESEQUENCE: '}}';
除了以下一些字符串之外,这项工作非常顺利:
{{{new [] {1,2,3,4}}}}
这给了我以下AST
PatternString => '{{{new[]{1, 2, 3, 4}}}}'
ExpressionString => '{{{new[]{1, 2, 3, 4}}}}'
Expression => '{{new[]{1, 2, 3, 4}}}'
ExpressionSegment => '{{new[]{1, 2, 3, 4}}}'
ExpressionSegment => '{new[]{1, 2, 3, 4}}'
ExpressionSegment => 'new[]'
ExpressionSegment => '{1, 2, 3, 4}'
ExpressionSegment => '1, 2, 3, 4'
虽然我期望(并希望)以下AST:
PatternString => '{{{new[]{1, 2, 3, 4}}}}'
PlainString => '{{'
ExpressionString => '{new[]{1, 2, 3, 4}}'
Expression => 'new[]{1, 2, 3, 4}'
ExpressionSegment => 'new[]'
ExpressionSegment => '{1, 2, 3, 4}'
ExpressionSegment => '1, 2, 3, 4'
PlainString => '}}'
意思是, plainString
应该更贪婪并尽可能多地使用转义括号。我怎样才能在上面的语法中解决这个问题?
答案 0 :(得分:3)
我认为您的问题是由于明确定义了打开和关闭花括号的规则,然后在一些解析器规则中将它们作为字符串文字引用。通过将表达式段规则修改为仅引用词法分析器规则,问题似乎得到解决。 请试用这个语法,看看你的问题是否已修复
expressionString: CBO expression CBC | CURLYBRACKETS_EMPTY
;
expression: expressionSegment+
;
expressionSegment:
L_PAREN expressionSegment+ R_PAREN
| L_BRACKET expressionSegment+ R_BRACKET
| CBO expressionSegment+ CBC
| L_PAREN WS R_PAREN
| L_BRACKET WS R_BRACKET
| L_PAREN R_PAREN
| L_BRACKET R_BRACKET
| CURLYBRACKETS_EMPTY
| stringLiteral
| charLiteral
| ~(DOUBLE_QUOTE | SINGLE_QUOTE | CBC | CBO | L_PAREN | L_BRACKET | R_PAREN | R_BRACKET)+
;
stringLiteral: '"' (~('"') | '\\"')+ '"'
| '""'
;
charLiteral: '\'' (~('\'') | '\\\'')+ '\''
;
WS: (' ' | '\r' | '\n' | '\t')+;
PLAINSTRINGLITERAL: ~('{' | '}');
CURLYBRACKETS_EMPTY: (CBO WS CBC | CBO CBC);
CBO: '{';
CBC: '}';
L_PAREN: '(';
R_PAREN: ')';
L_BRACKET: '[';
R_BRACKET: ']';
SINGLE_QUOTE: '\'';
DOUBLE_QUOTE: '"';
如您所见,解析树似乎反映了您正在寻找的内容