Question

我正在尝试为lex构建一个与mardown语法中的粗体文本匹配的正则表达式。例如：__strong text__我认为：

__[A-Za-z0-9_ ]+__

然后用

替换文本

<strong>Matched text</strong>

但在Lex中，此规则会导致变量yytext为__Matched Text__。我怎么能摆脱下划线？最好创建一个与下划线不匹配的正则表达式，或者使用变量yytext来删除它？

使用捕获组会更容易，因为我只需要正则表达式：

__([A-z0-9 ]+)__

并使用\1。但Lex并不支持捕获群组。

答案

我终于接受了JoãoNeto提供的第一个选项，但稍作修改：

yytext[strlen(yytext)-len]='\0'; // exclude last len characters
yytext+=len; // exclude first len characters

我已尝试使用Start conditions，因为他提到了第二种选择，但没有效果。

Answer 1

您可以删除第一个和最后两个字符来处理yytext。

yytext[strlen(yytext)-2]='\0'; // exclude last two characters
yylval.str = &yytext[2]; // exclude first two characters

另一个选择是使用堆栈

%option stack
%x bold

%%

"__"         { yy_push_state(bold); yylval.str = new std::string(); }
<bold>"__"   { yy_pop_state(); return BOLD_TOKEN; }
<bold>.|\n   { yylval.str += yytext; }

从Lex正则表达式中排除某些字符

答案

1 个答案: