Question

在我的语法中，我想为字符串允许2种语法：

经典方式"my \"string\""，在这里没问题。
具有任意转义边界的新方法：|"my "string"|"，|x"my |"string"|x"。其目的是保持字符串内容无任何转义，例如，当x（ht）ml文件中包含js片段时，绝对不要出现a && b之类的东西。

本着我的精神，我希望表达以下内容：

'|' {$Boundary} '"' {AnyCharSequenceExcept('|' $Boundary '"')} '|' {$Boundary} '"'

我知道我无法在标准ANTLR4中做到这一点。可以通过动作来做到吗？

Answer 1

这是一种实现方法：

lexer grammar DemoLexer;

@members {

def ahead(self, steps):
    """
    Returns the next `steps` characters ahead in the character-stream or None if
    there aren't `steps` characters ahead aymore
    """
    text = ""
    for n in range(1, steps + 1):
        next = self._input.LA(n)
        if next == Token.EOF:
            return None
        text += chr(next)
    return text

def consume_until(self, open_tag):
    """
    If we get here, it means the lexer matched an opening tag, and we now consume as
    much characters until we match the corresponsing closing tag
    """
    while True:
        ahead = self.ahead(len(open_tag))
        if ahead == None:
            raise Exception("missing '{}' close tag".format(open_tag))
        if ahead == open_tag:
            break
        self._input.consume()

    # Be sure to consume the end_tag, which has the same character count as `open_tag`
    for n in range(0, len(open_tag)):
        self._input.consume()

}

STRING
 : '|' ~'"'* '"' {self.consume_until(self.text)}
 ;

SPACES
 : [ \t\r\n] -> skip
 ;

OTHER
 : .
 ;

如果您是根据上述语法生成词法分析器并运行以下（Python）脚本：

from antlr4 import *
from DemoLexer import DemoLexer


source = """
foo |x"my |"string"|x" bar
"""

lexer = DemoLexer(InputStream(source))
stream = CommonTokenStream(lexer)
stream.fill()

for token in stream.tokens[:-1]:
    print("{0:<25} '{1}'".format(DemoLexer.symbolicNames[token.type], token.text))

以下内容将打印到您的控制台：

OTHER                     'f'
OTHER                     'o'
OTHER                     'o'
STRING                    '|x"my |"string"|x"'
OTHER                     'b'
OTHER                     'a'
OTHER                     'r'

如何在ANTLR4中定义一个带有转义边界的字符串（如multipart mimetype）？

1 个答案: