Question

如何在lex（或flex）中编程以从文本中删除嵌套注释并仅打印不在注释中的文本？我应该以某种方式识别我在评论时的状态以及块评论的起始“标签”的数量。

让我们有规则：
1.阻止评论

/*
block comment
*/

2。行评论

// line comment

第3。评论可以嵌套。

示例1

show /* comment /* comment */ comment */ show

输出：

show  show

示例2

show /* // comment
comment
*/
show

输出：

show 
show

示例3

show
///* comment
comment
// /*
comment
//*/ comment
//
comment */
show

输出：

show
show

Answer 1

你的理论是正确的。这是一个简单的实现;可以改进。

%x COMMENT
%%
%{
   int comment_nesting = 0;
%}

"/*"            BEGIN(COMMENT); ++comment_nesting;
"//".*          /* // comments to end of line */

<COMMENT>[^*/]* /* Eat non-comment delimiters */
<COMMENT>"/*"   ++comment_nesting;
<COMMENT>"*/"   if (--comment_nesting == 0) BEGIN(INITIAL);
<COMMENT>[*/]   /* Eat a / or * if it doesn't match comment sequence */

  /* Could have been .|\n ECHO, but this is more efficient. */
([^/]*([/][^/*])*)* ECHO;  
%%

Answer 2

这正是您所需要的：yy_push_state(COMMENT)它使用堆栈来存储我们的状态，这些状态在嵌套情况下很方便。

Answer 3

我担心@rici的回答可能是错的。首先，我们需要记录行号，稍后可能会更改文件行指令。第二个给open_sign和close_sign。我们有以下原则：

1) using an integer for stack control: push for open sign, popup for close sign
2) eat up CHARACTER BEFORE EOF and close sign WITHOUT open sign inside
<comments>{open} {no_open_sign++;}
<comments>\n {curr_lineno++;}
<comments>[^({close})({open})(EOF)] /*EAT characters by doing nothing*/
3) Errors might happen when no_open_sign down to zero, hence
<comments>{close}  similar as above post
4) EOF should not be inside the string, hence you need a rule
<comments>(EOF) {return ERROR_TOKEN;}

为了使其更加健壮，您还需要在

之外设置另一个关闭检查规则

在实践中，如果你的词法分析器支持它，你应该使用负面看，并查看正则表达式语法。

删除嵌套注释bz lex

3 个答案: