在语法文件的某个点上,我希望ANTLR将我的输入读取为2个标记而不是一个标记。 在我的源文件中,我有一个值
12345.name
而词法分析器会消耗
12345.
作为FLOAT令牌。在源文件的这一特定点上,我希望ANTLR将其读取为
有没有办法告诉ANTLR在某个给定的点应该忽略FLOAT类型?
这是我当前的.g4文件:
grammar Quest;
import Lua;
@header {
package dev.codeflush.m2qc.antlr;
}
/*
prefixed everything with "m2" to avoid nameclashes
*/
m2QuestFile
: m2Define* m2Quest* EOF
;
m2Define
: 'define' NAME m2DefineValue
;
m2DefineValue
: ~('\r\n' | '\r' | '\n')
;
m2Quest
: 'quest' NAME 'begin' m2State* 'end'
;
m2State
: 'state' NAME 'begin' (m2TriggerBlock | m2Function)* 'end'
;
m2TriggerBlock
: 'when' m2Trigger ('or' m2Trigger)* ('with' exp)? 'begin' block 'end'
;
m2Function
: 'function' NAME funcbody
;
m2Trigger
: m2TriggerTarget DOT m2TriggerEvent DOT m2TriggerSubEvent DOT m2TriggerArgument
| m2TriggerTarget DOT m2TriggerEvent DOT m2TriggerArgument
| m2TriggerTarget DOT m2TriggerEvent
| m2TriggerEvent
;
m2TriggerTarget
: NAME
| INT
| NORMALSTRING
;
/*
not complete
*/
m2TriggerEvent
: 'button'
| 'enter'
| 'info'
| 'item_informer'
| 'kill'
| 'leave'
| 'letter'
| 'levelup'
| 'login'
| 'logout'
| 'unmount'
| 'target'
| 'chat'
| 'timer'
| 'server_timer'
;
m2TriggerSubEvent
: 'click'
| 'chat'
| 'arrive'
;
m2TriggerArgument
: exp
;
DOT
: '.'
;
我正在使用https://github.com/antlr/grammars-v4/blob/master/lua/Lua.g4的Lua语法
我当前的样本输入文件如下:
quest test begin
state start begin
when kill begin
end
when "12345".kill begin
end
when 12345.kill begin
end
end
end
前两个按预期工作,但第三个不按预期工作(因为词法分析器将“ 12345.”读为一个FLOAT令牌)
答案 0 :(得分:1)
在特定条件下(这里:当点后面直接跟标识符(包括关键字)时),我在语法where I wanted to issue multiple tokens (2 actually) for a single match中也有类似的需求。
// Special rule that should also match all keywords if they are directly preceded by a dot.
// Hence it's defined before all keywords.
// Here we make use of the ability in our base lexer to emit multiple tokens with a single rule.
DOT_IDENTIFIER:
DOT_SYMBOL LETTER_WHEN_UNQUOTED_NO_DIGIT LETTER_WHEN_UNQUOTED* { emitDot(); } -> type(IDENTIFIER)
;
需要helper function来发出额外的令牌:
/**
* Puts a DOT token onto the pending token list.
*/
void MySQLBaseLexer::emitDot() {
_pendingTokens.emplace_back(_factory->create({this, _input}, MySQLLexer::DOT_SYMBOL, _text, channel,
tokenStartCharIndex, tokenStartCharIndex, tokenStartLine,
tokenStartCharPositionInLine));
++tokenStartCharIndex;
}
则需要对令牌生产进行自定义处理。您必须在令牌流中使用override nextToken
方法,以在返回下一个真实令牌之前考虑待处理令牌列表。
/**
* Allow a grammar rule to emit as many tokens as it needs.
*/
std::unique_ptr<antlr4::Token> MySQLBaseLexer::nextToken() {
// First respond with pending tokens to the next token request, if there are any.
if (!_pendingTokens.empty()) {
auto pending = std::move(_pendingTokens.front());
_pendingTokens.pop_front();
return pending;
}
// Let the main lexer class run the next token recognition.
// This might create additional tokens again.
auto next = Lexer::nextToken();
if (!_pendingTokens.empty()) {
auto pending = std::move(_pendingTokens.front());
_pendingTokens.pop_front();
_pendingTokens.push_back(std::move(next));
return pending;
}
return next;
}
请紧记:词法分析器规则仍然会发布自己的令牌(在这里我将其设置为IDENTIFIER
),这意味着您只需要发布其他令牌。