使用状态在flex中的注释模式匹配

时间:2018-05-08 15:16:44

标签: regex flex-lexer

我正在尝试匹配flex中的单行注释模式。评论的模式可以是:

//this is a single /(some random stuff) line comment

或者它可能是这样的:

// this is also a comment\
continuation of the comment from previous line

从示例中可以看出,我必须处理多线情况。

现在我的方法是使用状态。这就是我到目前为止所做的:

"//"                    {
                            yymore();
                            BEGIN (SINGLE_COMMENT); 
                        }

<SINGLE_COMMENT>([^{NEWLINE}]|\\[(.){NEWLINE}]) {
                                                    yymore();
                                                }           

<SINGLE_COMMENT>([^{NEWLINE}]|[^\\]{NEWLINE})   {
                                                    logout << "Line no " << line_count << ": TOKEN <COMMENT> Lexeme " << string(yytext) << "\nfound\n\n";
                                                    BEGIN (INITIAL);
                                                }

NEWLINE声明为:

 NEWLINE \r?\n

我的声明单位:

%option noyywrap

%x SINGLE_COMMENT

int line_count = 1;
const int bucketSize = 10; // change if necessary

ofstream logout;
ofstream tokenout;

SymbolTable symbolTable(bucketSize);

NEWLINE的行动:

{NEWLINE}    {
                line_count++;
             }

如果我使用以下输入运行它:

// hello\
int main

这是我的日志文件:

Line no 1: TOKEN <COMMENT> Lexeme // hello\

found

Line no 1: TOKEN <INT> Lexeme int found

Line no 1: TOKEN <ID> Lexeme main found


 ScopeTable # 1

 6 --> < main , ID > 

所以,它没有抓住多行评论。此外,line_count不会递增。它保持不变。任何人都可以帮我弄清楚我做错了什么吗?

Link to code

1 个答案:

答案 0 :(得分:2)

在(f)lex中,与大多数正则表达式引擎一样,#################################### # START Redirect pages from old site # <IfModule mod_rewrite.c> RewriteEngine On RewriteRule ^Properties https://www.example.com/home-listings [L,NC,NE,R=301] RewriteRule ^Access https://www.example.com/search-homes [L,NC,NE,R=301] RewriteRule ^Neighboorhoods https://www.example.com/neighborhoods [L,NC,NE,R=301] RewriteRule ^Buyer-Resources https://www.example.com/buy-home-in-colorado-springs [L,NC,NE,R=301] RewriteRule ^Relocation-Guide https://www.example.com/buy-home-in-colorado-springs/relocation-guide [L,NC,NE,R=301] RewriteRule ^Buyer-Resources/Buyer-Finance/Finance-Information https://www.example.com/buy-home-in-colorado-springs/home-finance [L,NC,NE,R=301] RewriteRule ^Seller-Resources https://www.example.com/sell-colorado-springs-home [L,NC,NE,R=301] RewriteRule ^Area-Schools https://www.example.com/local-lifestyle/area-schools [L,NC,NE,R=301] RewriteRule ^Colorado-Springs-Attractions https://www.example.com/local-lifestyle/colorado-springs-attractions [L,NC,NE,R=301] RewriteRule ^Military-Bases https://www.example.com/local-lifestyle/military-bases [L,NC,NE,R=301] RewriteRule ^About$ https://www.example.com/about-us [L,NC,NE,R=301] RewriteRule ^contact$ https://www.example.com/contact-us [L,NC,NE,R=301] RewriteRule ^Terms-Of-Service https://www.example.com/terms-of-service [L,NE,R=301] RewriteRule ^Privacy-Policy https://www.example.com/privacy-policy [L,NE,R=301] RewriteRule ^Site-Map https://www.example.com/sitemap [L,NC,NE,R=301] RewriteRule ^neighborhoods/fountain$ https://www.example.com/neighborhoods/fountain-security-widefield [L,NC,NE,R=301] RewriteRule ^neighborhoods/securitywidefield https://www.example.com/neighborhoods/fountain-security-widefield [L,NC,NE,R=301] RewriteRule ^park-avenue-properties-blog https://www.example.com/blog [L,NC,NE,R=301] RewriteRule ^Primary-Factors-the-Affect-the-Real-Estate-Market https://www.example.com/primary-factors-affect-real-estate-market [L,NC,NE,R=301] </IfModule> # END Redirect pages from old site # Force HTTPS <IfModule mod_rewrite.c> RewriteEngine On RewriteCond %{SERVER_PORT} 80 RewriteCond %{REQUEST_URI} !^/[0-9]+\..+\.cpaneldcv$ RewriteCond %{REQUEST_URI} !^/\.well-known/acme-challenge/[0-9a-zA-Z_-]+$ RewriteCond %{REQUEST_URI} !^/\.well-known/pki-validation/[A-F0-9]{32}\.txt(?:\ Comodo\ DCV)?$ RewriteRule ^(.*)$ https://www.example.com/$1 [R,L] </IfModule> # Remove "Blog" from blog post URLs and preserve blog paging <IfModule mod_rewrite.c> RewriteEngine On RewriteBase / RewriteCond %{REQUEST_URI} !page RewriteCond %{REQUEST_URI} !^/[0-9]+\..+\.cpaneldcv$ RewriteCond %{REQUEST_URI} !^/\.well-known/acme-challenge/[0-9a-zA-Z_-]+$ RewriteCond %{REQUEST_URI} !^/\.well-known/pki-validation/[A-F0-9]{32}\.txt(?:\ Comodo\ DCV)?$ RewriteRule ^Blog/(.*)$ /$1 [L,NC,R=301] </IfModule> # Redirect old home listing to a search page #<IfModule mod_rewrite.c> # RewriteEngine On # RewriteBase / # RewriteCond %{REQUEST_FILENAME} !-d # RewriteCond %{REQUEST_FILENAME} !-f # RewriteRule ^homes-for-sale-details/(.*)$ https://www.example.com/homes-for-sale-details [L,NC,R=301] #</IfModule> #################################### # Browser caching code removed :) #################################### # BEGIN WordPress <IfModule mod_rewrite.c> RewriteEngine On RewriteBase / RewriteRule ^index\.php$ - [L] RewriteCond %{REQUEST_FILENAME} !-f RewriteCond %{REQUEST_FILENAME} !-d RewriteRule . /index.php [L] </IfModule> # END WordPress [包含character class描述。字符类是一组单独的字符,它始终只匹配一个字符,该字符是该集合的成员。还有一些否定的字符类,它们以相同的方式编写,除了它们以]开头并且恰好匹配一个不是该集合成员的字符。

字符类与字符序列不同:

  • [^匹配ab后跟a
  • b匹配[ab]a

由于字符类只是字符集,因此对于类中的各个字符重复或可选等等没有意义。因此,几乎没有正则表达式运算符(b*+等)在字符类中是有意义的。如果将其中一个放在一个字符类表达式中,它就像普通字符一样处理:

  • ?匹配0个或更多a* s
  • a匹配[a*]a

大多数其他正则表达式系统未提供的flex功能之一是*形式的宏扩展。这里{name}{表示已定义宏的扩展,其名称包含在大括号之间。这些字符在字符类中也不是特殊的:

  • }匹配名为{identifier}的扩展宏匹配的任何内容。
  • identifier匹配单个字符[{identifier}]{或其中一个字母}

初学者似乎过度使用宏定义。我的建议是始终避免它们,从而避免它们造成的混乱。

值得注意的是(f)lex没有一个否定子模式的运算符。只有字符类才能被否定;没有简单的方法来写“匹配definrt以外的任何东西”。但是,您通常可以依赖first longest-match rule来有效地实现否定:如果某个模式foo执行,则不会有任何模式匹配超过p。因此,可能没有必要明确地写出否定。

例如,在您的评论检测器中,唯一真正的问题是处理回车符(p)后面没有换行符的字符,您可以使用(f)lex的模式匹配算法来获得优势:

\r

顺便说一句,提供%option yylineno通常比尝试手动跟踪换行要容易得多。