Question

我正在研究一种基本上是岛语法的语法。

让我们说＆＃34;岛＆＃34;是大括号之间的一切，＆＃34; sea＆＃34;是一切都不是。像这样：

{（岛屿内容）}

然后这个简单的语法起作用：

IslandStart
:
    '{' -> pushMode(Island)
;

Fluff
:
    ~[\{\}]+
;

....

但是我很难找到一个类似的解决方案来解决我希望我的＆＃34;岛＆＃34;的复杂（多角色）开放的情况。块，像这样：

{＃（岛屿内容）}

在这种情况下，我不知道如何为＆＃34; Fluff＆＃34;制定规则。（除了我的开场序列之外的一切）。

IslandStart
    :
        '{#' -> pushMode(Island)
    ;

Fluff
    :
        ~[\{\}]+ /* Should now include opening braces as well 
                    if they are not immaediately followed by # sign */
    ;

如何让它发挥作用？

编辑：GRosenberg提出了一个解决方案，但我得到了很多令牌（每个角色一个）。这是演示此行为的示例：

我的lexer语法：

lexer grammar Demolex;

IslandStart
    :
        '{$' -> pushMode(Island)
    ;


Fluff
    : 
          '{' ~'$' .* // any 2+ char seq that starts with '{', but not '{#'
        | '{' '$$' .* // starts with hypothetical not IslandStart marker
        | '{'         // just the 1 char 
        | .*? ~'{'    // minimum sequence that ends before an '{'
    ;

mode Island;

IslandEnd
    :
        '}' -> popMode
    ;

最简单的解析器语法：

grammar Demo;
options { tokenVocab = Demolex; }

template
    :
        Fluff+
    ;

这将生成一个树，其中包含来自输入的许多标记＆＃34; somanytokens＆＃34;当我在Eclipse的antlr4插件中调试它时：

它不太可能是插件问题。我可以轻松地提出一个令牌定义，这将导致树中的一个大胖令牌。

实际上，即使是最简单的语法形式也会产生这样的结果：

grammar Demo2;

template4
    :
        Fluff+
    ;

Fluff
    : 
         .*? ~'{'    // minimum sequence that ends before an '{'
    ;

Answer 1

只需要指定序列差异的补码：

IslandStart : '{#' -> pushMode(Island) ;

Fluff       : '{' ~'#' .* // any 2+ char seq that starts with '{', but not '{#'
            | '{' '##' .* // starts with hypothetical not IslandStart marker
            | '{'         // just the 1 char 
            | .*? ~'{'    // minimum sequence that ends before an '{'
            ;

Fluff alt2在相对于IslandStart的较长匹配时有效。只有当IslandStart和Fluff alt1与以'{'开头的字符序列不匹配时，Fluff alt3才有效。 Fluff alt4是内容的捕获，但不包括'{'，允许词法分析器考虑在'{'上对齐的序列。

<强>更新

让我们使它成为一个更合理的完整示例语法

parser grammar TestParser;

options{
    tokenVocab=TestLexer;
}

template : ( Fluff | Stuff )+ EOF ;

和

lexer grammar TestLexer;

IslandStart : '{' '$' -> pushMode(Island),more ;

Fluff : '{' ~'$' ~'{'*? '}'     // any 2+ char seq that starts with '{', but not '{$'
      | '{' '$' '$' ~'{'*? '}'  // or starts with hypothetical not IslandStart marker
      | '{' '}'                 // just the empty pair
      | ~'{'+                   // minimum sequence that ends before an '{'
      ;

mode Island;

Stuff : '}' -> popMode ;
Char  : .   -> more    ;

输入so{$Island}many{}tokens{$$notIsland}and{inner}end

令牌转储：

Fluff: [@0,0:1='so',<1>,1:0]
Stuff: [@1,2:10='{$Island}',<2>,1:2]
Fluff: [@2,11:14='many',<1>,1:11]
Fluff: [@3,15:16='{}',<1>,1:15]
Fluff: [@4,17:22='tokens',<1>,1:17]
Fluff: [@5,23:35='{$$notIsland}',<1>,1:23]
Fluff: [@6,36:38='and',<1>,1:36]
Fluff: [@7,39:45='{inner}',<1>,1:39]
Fluff: [@8,46:48='end',<1>,1:46]

解析树：

(template so {$Island} many {} tokens {$$notIsland} and {inner} end <EOF>)

词法分析器规则的操作保持不变。进行了更改以适应正确的匹配终端。简化后的Alt4按原计划工作。不完全确定为什么Antlr开始出现问题，但在任何情况下都更简单。

在多字符令牌上切换到岛模式

1 个答案: