LOOKAHEADs用于JavaScript / ECMAScript数组文字制作

时间:2014-11-13 12:16:52

标签: javascript parsing grammar ecmascript-5 javacc

我目前正在使用JavaCC实现JavaScript / ECMAScript 5.1解析器,并且ArrayLiteral生产出现问题。

ArrayLiteral :
    [ Elision_opt ]
    [ ElementList ]
    [ ElementList , Elision_opt ]

ElementList :
    Elision_opt AssignmentExpression
    ElementList , Elision_opt AssignmentExpression

Elision :
    ,
    Elision ,

我有三个问题,我会逐一问他们。

这是第二个。


我已将此制作简化为以下形式:

ArrayLiteral:
    "[" ("," | AssignmentExpression ",") * AssignmentExpression ? "]"

请查看第一个问题是否正确:

  

How to simplify JavaScript/ECMAScript array literal production?

现在我尝试在JavaCC中实现它,如下所示:

void ArrayLiteral() :
{
}
{
    "["
    (
        ","
    |   AssignmentExpression()
        ","
    ) *
    (
        AssignmentExpression()
    ) ?
    "]"
}

JavaCC抱怨模棱两可的,AssignmentExpression(其内容)。显然,需要LOOKAHEAD规范。我花了很多时间试图找出LOOKAHEAD,尝试了不同的东西,比如

    LOOKAHEAD (AssignmentExpression() ",")
  • (...)*LOOKAHEAD (AssignmentExpression() "]")
  • (...)?

以及其他一些变体,但我无法摆脱JavaCC警告。

我无法理解为什么这不起作用:

void ArrayLiteral() :
{
}
{
    "["
    (
        LOOKAHEAD ("," | AssignmentExpression() ",")
        ","
    |   AssignmentExpression()
        ","
    ) *
    (
        LOOKAHEAD (AssignmentExpression() "]")
        AssignmentExpression()
    ) ?
    "]"
}

好的,AssignmentExpression()本身不明确,但","中的"]"LOOKAHEAD应该明确应该选择哪个选项 - 或者是我错了吗?

此产品的正确LOOKAHEAD规格是什么样的?

更新

不幸的是,这不起作用:

void ArrayLiteral() :
{
}
{
    "["
    (
        ","
    |
        LOOKAHEAD (AssignmentExpression() ",")
        AssignmentExpression()
        ","
    ) *
    (
        AssignmentExpression()
    ) ?
    "]"
}

警告:

Warning: Choice conflict in (...)* construct at line 6, column 5.
         Expansion nested within construct and expansion following construct
         have common prefixes, one of which is: "function"
         Consider using a lookahead of 2 or more for nested expansion.

第6行在第一个(之前为LOOKAHEAD。公共前缀"function"只是AssignmentExpression的可能启动之一。

3 个答案:

答案 0 :(得分:2)

JavaCC生成自上而下的解析器。我不会说自己不是自上而下的解析器生成器的粉丝,所以我不是JavaCC专家,而且我也不方便测试它。

编辑:我认为其他方法可行,但之后我意识到我不明白JavaCC如何选择实际选择;在( A | B )* C的情况下,实际上有三种可能的选择:A,B和C.我认为它会考虑所有这三种,但它可能一次只做两次。所以以下是另一种猜测。)

话虽如此,我认为以下内容可行,但它涉及解析几乎每AssignmentExpression()两次。

{
    "["
    (
        ","
    |
        AssignmentExpression()
        ","
    ) *
    (
        LOOKAHEAD (AssignmentExpression() "]")
        AssignmentExpression()
    ) ?
    "]"
}

正如我在the linked question中指出的,更好的解决方案是以不同方式重写产品:

"[" AssignmentExpression ? ("," AssignmentExpression ?) * "]"

这导致了一个令牌前瞻语法,因此您不需要LOOKAHEAD声明来处理它。

答案 1 :(得分:1)

这是另一种方法。它的优点是可以识别哪些逗号表示未定义的元素而不使用任何语义操作。

void ArrayLiteral() : {} { "[" MoreArrayLiteral() }

void MoreArrayLiteral() : {} {
    "]"
|    "," /* undefined item */ MoreArrayLiteral()
|    AssignmentExpression() ( "]" |  "," MoreArrayLiteral() )
}

答案 2 :(得分:0)

这就是我解决它的方式(感谢@rici的答案):

JSArrayLiteral ArrayLiteral() : 
{
    boolean lastElementWasAssignmentExpression = false;
}
{
    "["
    (
        (
            AssignmentExpression()
            {
                // Do something with expression
                lastElementWasAssignmentExpression = true;
            }
        ) ?
        (
            ","
            {
                if (!lastElementWasAssignmentExpression)
                {
                    // Do something with elision
                }
            }
            (
                AssignmentExpression()
                {
                    // Do something with expression
                    lastElementWasAssignmentExpression = true;
                }
            ) ?
        ) *
    )
    "]"
    {
        // Do something with results
    }
}