在语义上消除模糊语法的歧义

时间:2016-10-17 21:54:09

标签: antlr4

使用Antlr 4我有一种情况我不知道如何解决。我最初在https://groups.google.com/forum/#!topic/antlr-discussion/1yxxxAvU678的Antlr讨论论坛上提出了这个问题。但是那个论坛似乎没有得到很多流量,所以我在这里再问一次。

我有以下语法:

File "manage.py", line 10, in <module>
    execute_from_command_line(sys.argv)
  File "/usr/lib/python2.7/site-packages/django/core/management/__init__.py", line 367, in execute_from_command_line
    utility.execute()
  File "/usr/lib/python2.7/site-packages/django/core/management/__init__.py", line 341, in execute
    django.setup()
  File "/usr/lib/python2.7/site-packages/django/__init__.py", line 27, in setup
    apps.populate(settings.INSTALLED_APPS)
  File "/usr/lib/python2.7/site-packages/django/apps/registry.py", line 85, in populate
    app_config = AppConfig.create(entry)
  File "/usr/lib/python2.7/site-packages/django/apps/config.py", line 90, in create
    module = import_module(entry)
  File "/usr/lib64/python2.7/importlib/__init__.py", line 37, in import_module
    __import__(name)
ImportError: No module named djangocms_admin_style

这里的问题是 'Create a Chart Dim ChartPage As Excel.Chart Dim xlCharts As Excel.ChartObjects Dim myChart As Excel.ChartObject Dim chartRange As Excel.Range xlCharts = xlWorkSheet.ChartObjects myChart = xlCharts.Add(10, 80, 700, 450) ChartPage = myChart.Chart chartRange = xlWorkSheet.Range("A10", "K13") ChartPage.SetSourceData(Source:=chartRange) ChartPage.ChartType = Excel.XlChartType.xlColumnStacked 在语义上可能意味着许多事情,而不是所有事物都是“路径”。但目前他们都被认为是解析树中的路径,然后我需要特别在访问者中处理它们。

但是我真正喜欢的是一种表达dotIdentifierSequence用法的方法,这些用法不是expression : ... | path ; path : ... | dotIdentifierSequence ; dotIdentifierSequence : identifier (DOT identifier)* ; 规则中的路径而是dotIdentifierSequence规则中的路径,并且路径中仍然有dotIdentifierSequence来处理路径用途

要清楚,dotIdentifierSequence可能是以下任何一种:

  1. 路径 - 这是类似SQL的语法,路径表达式就像SQL中的表或列引用,例如expression
  2. Java类名称 - 例如path
  3. 静态Java字段引用 - 例如a.b.c
  4. Java枚举值引用 - 例如com.acme.SomeJavaType
  5. 这个想法是在访问期间“dotIdentifierSequence as a path”解析为与其他用法完全不同的类型。

    知道我该怎么做吗?

1 个答案:

答案 0 :(得分:1)

这里的问题是你试图区分&#34;路径&#34;在解析器中创建时。在词法分析器中构造路径会更容易(伪代码如下):

grammar T;

tokens {
  JAVA_TYPE_PATH,
  JAVA_FIELD_PATH
}

// parser rules

PATH
 : IDENTIFIER ('.' IDENTIFIER)*
   {
     String s = getText();
     if (s is a Java class) {
       setType(JAVA_TYPE_PATH);
     } else if (s is a Java field) {
       setType(JAVA_FIELD_PATH);
     }
   }
 ;

fragment IDENTIFIER : [a-zA-Z_] [a-zA-Z_0-9]*;

然后在解析器中执行:

expression
 : JAVA_TYPE_PATH   #javaTypeExpression
 | JAVA_FIELD_PATH  #javaFieldExpression  
 | PATH             #pathExpression
 ;

但是,当然,像java./*comment*/lang.String这样的输入会被错误地标记。

在解析器中处理它将意味着手动向前看令牌流并检查Java类型或字段是否存在。

快速演示:

grammar T;

@parser::members {

  String getPathAhead() {

    Token token = _input.LT(1);

    if (token.getType() != IDENTIFIER) {
      return null;
    }

    StringBuilder builder = new StringBuilder(token.getText());

    // Try to collect ('.' IDENTIFIER)*
    for (int stepsAhead = 2; ; stepsAhead += 2) {

      Token expectedDot = _input.LT(stepsAhead);
      Token expectedIdentifier = _input.LT(stepsAhead + 1);

      if (expectedDot.getType() != DOT || expectedIdentifier.getType() != IDENTIFIER) {
        break;
      }

      builder.append('.').append(expectedIdentifier.getText());
    }

    return builder.toString();
  }

  boolean javaTypeAhead() {

    String path = getPathAhead();

    if (path == null) {
      return false;
    }

    try {
      return Class.forName(path) != null;
    } catch (Exception e) {
      return false;
    }
  }

  boolean javaFieldAhead() {

    String path = getPathAhead();

    if (path == null || !path.contains(".")) {
      return false;
    }

    int lastDot = path.lastIndexOf('.');
    String typeName = path.substring(0, lastDot);
    String fieldName = path.substring(lastDot + 1);

    try {
      Class<?> clazz = Class.forName(typeName);
      return clazz.getField(fieldName) != null;
    } catch (Exception e) {
      return false;
    }
  }
}

expression
 : {javaTypeAhead()}?  path    #javaTypeExpression
 | {javaFieldAhead()}? path    #javaFieldExpression
 | path                        #pathExpression
 ;

path
 : dotIdentifierSequence
 ;

dotIdentifierSequence
 : IDENTIFIER (DOT IDENTIFIER)*
 ;

IDENTIFIER
 : [a-zA-Z_] [a-zA-Z_0-9]*
 ;

DOT
 : '.'
 ;

可以使用以下类进行测试:

package tl.antlr4;

import org.antlr.v4.runtime.ANTLRInputStream;
import org.antlr.v4.runtime.CommonTokenStream;
import org.antlr.v4.runtime.misc.NotNull;
import org.antlr.v4.runtime.tree.ParseTreeWalker;

public class Main {

    public static void main(String[] args) {

        String[] tests = {
            "mu",
            "tl.antlr4.The",
            "java.lang.String",
            "foo.bar.Baz",
            "tl.antlr4.The.answer",
            "tl.antlr4.The.ANSWER"
        };

        for (String test : tests) {
            TLexer lexer = new TLexer(new ANTLRInputStream(test));
            TParser parser = new TParser(new CommonTokenStream(lexer));
            ParseTreeWalker.DEFAULT.walk(new TestListener(), parser.expression());
        }
    }
}

class TestListener extends TBaseListener {

    @Override
    public void enterJavaTypeExpression(@NotNull TParser.JavaTypeExpressionContext ctx) {
        System.out.println("JavaTypeExpression  -> " + ctx.getText());
    }

    @Override
    public void enterJavaFieldExpression(@NotNull TParser.JavaFieldExpressionContext ctx) {
        System.out.println("JavaFieldExpression -> " + ctx.getText());
    }

    @Override
    public void enterPathExpression(@NotNull TParser.PathExpressionContext ctx) {
        System.out.println("PathExpression      -> " + ctx.getText());
    }
}

class The {
    public static final int ANSWER = 42;
}

会将以下内容打印到控制台:

PathExpression      -> mu
JavaTypeExpression  -> tl.antlr4.The
JavaTypeExpression  -> java.lang.String
PathExpression      -> foo.bar.Baz
PathExpression      -> tl.antlr4.The.answer
JavaFieldExpression -> tl.antlr4.The.ANSWER