在侦听器的antlr4解析器中检索跳过的空白

时间:2019-07-15 17:40:14

标签: c++ antlr antlr4

我正在尝试从解析的消息构造一个对象。 我正在使用Antlr4和C ++ 我的问题是我需要在词法分析过程中跳过空格,但是当我在侦听器中构造消息对象时,我必须将它们取回来。 这是我的语法

grammar MessageTest;
WS: ('\t' | ' ' | '\r' | '\n' )+ -> skip;

message: 
    messageInfo
    startOfMessage
    messageText+
| EOF;

messageInfo:
    senderName
    filingTime
    receiverName
;

senderName: WORD;

filingTime: DIGITS;

receiverName: WORD;

messageText: ( WORD | DIGITS | ALLOWED_SYMBOLS)+;

startOfMessage: START_OF_MESSAGE_SYMBOL ;

START_OF_MESSAGE_SYMBOL:':';

WORD: LETTER+;

DIGITS: DIGIT+;

LPAREN: '(';
RPAREN: ')';

ALLOWED_SYMBOLS:   '-'| '.' | ',' | '/' | '+' | '?';

fragment LETTER: [A-Z];

fragment DIGIT: [0-9];

因此该语法很好用,我的解析树对于以下消息示例是正确的:JOHN0120JANE:HI HOW ARE YOU? 我得到了这个解析树:

message (
 messageInfo (
  senderName (
   "JOHN"
  )
  filingTime (
   "0120"
  )
  receiverName (
   "JANE"
  )
 )
 startOfMessage (
  ":"
 )
 messageText (
  "HI"
  "HOW"
  "ARE"
  "YOU"
  "?"
 )
)

问题是当我试图以以下方式检索整个messageText时: HI HOW ARE YOU?我是从HIHOWAREYOU?那里获得MessageTextContext

我在做什么错了?

2 个答案:

答案 0 :(得分:1)

getText()检索功能从不考虑跳过或隐藏的标记。但是,通过使用存储在生成的令牌中的索引,很容易获得输入的原始文本(甚至只是与特定解​​析规则相对应的范围)。解析规则上下文包含一个起点和一个终点,因此很容易从上下文转到原始输入,如下所示:


std::string MySQLRecognizerCommon::sourceTextForContext(ParserRuleContext *ctx, bool keepQuotes) {
  return sourceTextForRange(ctx->start, ctx->stop, keepQuotes);
}

//----------------------------------------------------------------------------------------------------------------------

std::string MySQLRecognizerCommon::sourceTextForRange(tree::ParseTree *start, tree::ParseTree *stop, bool keepQuotes) {
  Token *startToken = antlrcpp::is<tree::TerminalNode *>(start) ? dynamic_cast<tree::TerminalNode *>(start)->getSymbol()
                                                                : dynamic_cast<ParserRuleContext *>(start)->start;
  Token *stopToken = antlrcpp::is<tree::TerminalNode *>(stop) ? dynamic_cast<tree::TerminalNode *>(start)->getSymbol()
                                                              : dynamic_cast<ParserRuleContext *>(stop)->stop;
  return sourceTextForRange(startToken, stopToken, keepQuotes);
}

//----------------------------------------------------------------------------------------------------------------------

std::string MySQLRecognizerCommon::sourceTextForRange(Token *start, Token *stop, bool keepQuotes) {
  CharStream *cs = start->getTokenSource()->getInputStream();
  size_t stopIndex = stop != nullptr ? stop->getStopIndex() : std::numeric_limits<size_t>::max();
  std::string result = cs->getText(misc::Interval(start->getStartIndex(), stopIndex));
  if (keepQuotes || result.size() < 2)
    return result;

  char quoteChar = result[0];
  if ((quoteChar == '"' || quoteChar == '`' || quoteChar == '\'') && quoteChar == result.back()) {
    if (quoteChar == '"' || quoteChar == '\'') {
      // Replace any double occurence of the quote char by a single one.
      replaceStringInplace(result, std::string(2, quoteChar), std::string(1, quoteChar));
    }

    return result.substr(1, result.size() - 2);
  }

  return result;
}

此代码是为与MySQL配合使用而量身定制的(例如,用引号括起来的字符),但很容易适应任何其他用例。基本部分是使用标记(例如,从解析规则上下文中获取)并从字符输入流中获取原始输入。

MySQL Workbench code base提取的代码。

答案 1 :(得分:0)

好像你想要Lexical Modes

使用它们的想法很简单:当您的词法分析器遇到runner.layout.xml时,它必须在只能使用一个令牌的情况下切换上下文,例如<application> <component name="RunnerLayoutSettings"> <runner id="JavaRunner"> <ViewImpl> <option name="ID" value="ConsoleContent" /> <option name="minimizedInGrid" value="false" /> <option name="placeInGrid" value="bottom" /> <option name="tabIndex" value="0" /> <option name="window" value="0" /> </ViewImpl> <TabImpl> <option name="bottomProportion" value="0.0" /> </TabImpl> <General /> </runner> <runner id="Android"> <ViewImpl> <option name="ID" value="Android Logcat" /> <option name="minimizedInGrid" value="false" /> <option name="placeInGrid" value="center" /> <option name="tabIndex" value="0" /> <option name="window" value="0" /> </ViewImpl> <TabImpl /> <General /> </runner> <runner id="Debug"> <ViewImpl> <option name="ID" value="FrameContent" /> <option name="minimizedInGrid" value="false" /> <option name="placeInGrid" value="left" /> <option name="tabIndex" value="0" /> <option name="window" value="0" /> </ViewImpl> <ViewImpl> <option name="ID" value="VariablesContent" /> <option name="minimizedInGrid" value="false" /> <option name="placeInGrid" value="center" /> <option name="tabIndex" value="0" /> <option name="window" value="0" /> </ViewImpl> <ViewImpl> <option name="ID" value="OverheadMonitor" /> <option name="minimizedInGrid" value="false" /> <option name="placeInGrid" value="right" /> <option name="tabIndex" value="0" /> <option name="window" value="0" /> </ViewImpl> <ViewImpl> <option name="ID" value="ThreadsContent" /> <option name="minimizedInGrid" value="false" /> <option name="placeInGrid" value="left" /> <option name="tabIndex" value="0" /> <option name="window" value="0" /> </ViewImpl> <ViewImpl> <option name="ID" value="ConsoleContent" /> <option name="minimizedInGrid" value="false" /> <option name="placeInGrid" value="bottom" /> <option name="tabIndex" value="1" /> <option name="window" value="0" /> </ViewImpl> <TabImpl> <option name="displayName" value="Debugger" /> <option name="leftProportion" value="0.19965477" /> <option name="rightProportion" value="0.19965477" /> </TabImpl> <TabImpl> <option name="bottomProportion" value="0.0" /> <option name="defaultIndex" value="1" /> <option name="index" value="1" /> </TabImpl> <General /> </runner> </component> </application> 令牌。 确定此令牌后,词法分析器的模式将切换回其默认模式。

为此,您应该首先将语法分为两部分:词法语法和解析器语法,因为组合语法中不允许词法模式。然后您可以使用 START_OF_MESSAGE_SYMBOLMESSAGE_TEXT命令。

这是一个例子:

MessageTestLexer.g4

pushMode()

MessageTestParser.g4

popMode()

P.S。没有测试这些语法,但似乎应该可以。