我正在尝试从解析的消息构造一个对象。 我正在使用Antlr4和C ++ 我的问题是我需要在词法分析过程中跳过空格,但是当我在侦听器中构造消息对象时,我必须将它们取回来。 这是我的语法
grammar MessageTest;
WS: ('\t' | ' ' | '\r' | '\n' )+ -> skip;
message:
messageInfo
startOfMessage
messageText+
| EOF;
messageInfo:
senderName
filingTime
receiverName
;
senderName: WORD;
filingTime: DIGITS;
receiverName: WORD;
messageText: ( WORD | DIGITS | ALLOWED_SYMBOLS)+;
startOfMessage: START_OF_MESSAGE_SYMBOL ;
START_OF_MESSAGE_SYMBOL:':';
WORD: LETTER+;
DIGITS: DIGIT+;
LPAREN: '(';
RPAREN: ')';
ALLOWED_SYMBOLS: '-'| '.' | ',' | '/' | '+' | '?';
fragment LETTER: [A-Z];
fragment DIGIT: [0-9];
因此该语法很好用,我的解析树对于以下消息示例是正确的:JOHN0120JANE:HI HOW ARE YOU?
我得到了这个解析树:
message (
messageInfo (
senderName (
"JOHN"
)
filingTime (
"0120"
)
receiverName (
"JANE"
)
)
startOfMessage (
":"
)
messageText (
"HI"
"HOW"
"ARE"
"YOU"
"?"
)
)
问题是当我试图以以下方式检索整个messageText
时:
HI HOW ARE YOU?
我是从HIHOWAREYOU?
那里获得MessageTextContext
我在做什么错了?
答案 0 :(得分:1)
getText()
检索功能从不考虑跳过或隐藏的标记。但是,通过使用存储在生成的令牌中的索引,很容易获得输入的原始文本(甚至只是与特定解析规则相对应的范围)。解析规则上下文包含一个起点和一个终点,因此很容易从上下文转到原始输入,如下所示:
std::string MySQLRecognizerCommon::sourceTextForContext(ParserRuleContext *ctx, bool keepQuotes) {
return sourceTextForRange(ctx->start, ctx->stop, keepQuotes);
}
//----------------------------------------------------------------------------------------------------------------------
std::string MySQLRecognizerCommon::sourceTextForRange(tree::ParseTree *start, tree::ParseTree *stop, bool keepQuotes) {
Token *startToken = antlrcpp::is<tree::TerminalNode *>(start) ? dynamic_cast<tree::TerminalNode *>(start)->getSymbol()
: dynamic_cast<ParserRuleContext *>(start)->start;
Token *stopToken = antlrcpp::is<tree::TerminalNode *>(stop) ? dynamic_cast<tree::TerminalNode *>(start)->getSymbol()
: dynamic_cast<ParserRuleContext *>(stop)->stop;
return sourceTextForRange(startToken, stopToken, keepQuotes);
}
//----------------------------------------------------------------------------------------------------------------------
std::string MySQLRecognizerCommon::sourceTextForRange(Token *start, Token *stop, bool keepQuotes) {
CharStream *cs = start->getTokenSource()->getInputStream();
size_t stopIndex = stop != nullptr ? stop->getStopIndex() : std::numeric_limits<size_t>::max();
std::string result = cs->getText(misc::Interval(start->getStartIndex(), stopIndex));
if (keepQuotes || result.size() < 2)
return result;
char quoteChar = result[0];
if ((quoteChar == '"' || quoteChar == '`' || quoteChar == '\'') && quoteChar == result.back()) {
if (quoteChar == '"' || quoteChar == '\'') {
// Replace any double occurence of the quote char by a single one.
replaceStringInplace(result, std::string(2, quoteChar), std::string(1, quoteChar));
}
return result.substr(1, result.size() - 2);
}
return result;
}
此代码是为与MySQL配合使用而量身定制的(例如,用引号括起来的字符),但很容易适应任何其他用例。基本部分是使用标记(例如,从解析规则上下文中获取)并从字符输入流中获取原始输入。
从MySQL Workbench code base提取的代码。
答案 1 :(得分:0)
好像你想要Lexical Modes。
使用它们的想法很简单:当您的词法分析器遇到runner.layout.xml
时,它必须在只能使用一个令牌的情况下切换上下文,例如<application>
<component name="RunnerLayoutSettings">
<runner id="JavaRunner">
<ViewImpl>
<option name="ID" value="ConsoleContent" />
<option name="minimizedInGrid" value="false" />
<option name="placeInGrid" value="bottom" />
<option name="tabIndex" value="0" />
<option name="window" value="0" />
</ViewImpl>
<TabImpl>
<option name="bottomProportion" value="0.0" />
</TabImpl>
<General />
</runner>
<runner id="Android">
<ViewImpl>
<option name="ID" value="Android Logcat" />
<option name="minimizedInGrid" value="false" />
<option name="placeInGrid" value="center" />
<option name="tabIndex" value="0" />
<option name="window" value="0" />
</ViewImpl>
<TabImpl />
<General />
</runner>
<runner id="Debug">
<ViewImpl>
<option name="ID" value="FrameContent" />
<option name="minimizedInGrid" value="false" />
<option name="placeInGrid" value="left" />
<option name="tabIndex" value="0" />
<option name="window" value="0" />
</ViewImpl>
<ViewImpl>
<option name="ID" value="VariablesContent" />
<option name="minimizedInGrid" value="false" />
<option name="placeInGrid" value="center" />
<option name="tabIndex" value="0" />
<option name="window" value="0" />
</ViewImpl>
<ViewImpl>
<option name="ID" value="OverheadMonitor" />
<option name="minimizedInGrid" value="false" />
<option name="placeInGrid" value="right" />
<option name="tabIndex" value="0" />
<option name="window" value="0" />
</ViewImpl>
<ViewImpl>
<option name="ID" value="ThreadsContent" />
<option name="minimizedInGrid" value="false" />
<option name="placeInGrid" value="left" />
<option name="tabIndex" value="0" />
<option name="window" value="0" />
</ViewImpl>
<ViewImpl>
<option name="ID" value="ConsoleContent" />
<option name="minimizedInGrid" value="false" />
<option name="placeInGrid" value="bottom" />
<option name="tabIndex" value="1" />
<option name="window" value="0" />
</ViewImpl>
<TabImpl>
<option name="displayName" value="Debugger" />
<option name="leftProportion" value="0.19965477" />
<option name="rightProportion" value="0.19965477" />
</TabImpl>
<TabImpl>
<option name="bottomProportion" value="0.0" />
<option name="defaultIndex" value="1" />
<option name="index" value="1" />
</TabImpl>
<General />
</runner>
</component>
</application>
令牌。
确定此令牌后,词法分析器的模式将切换回其默认模式。
为此,您应该首先将语法分为两部分:词法语法和解析器语法,因为组合语法中不允许词法模式。然后您可以使用
START_OF_MESSAGE_SYMBOL
和MESSAGE_TEXT
命令。
这是一个例子:
MessageTestLexer.g4
pushMode()
MessageTestParser.g4
popMode()
P.S。没有测试这些语法,但似乎应该可以。