Question

我正在尝试在flex中定义一条捕获“多行字符串”的规则多行字符串是一个以三个撇号开头的字符串：'''，以三个撇号结尾，并且可以跨越多行。
例如：

'''This is
an example of
a multiline
string'''

所以我的尝试是这样的：

%{
#include<iostream>
using std::cout;
using std::endl;

%}

MULTI_LN_STR    '''(.|\n)*'''

%%

{MULTI_LN_STR}  {cout<<"GotIt!";}   

%%

int main(int argc, char* argv[]) {

    yyin=fopen("test.txt", "r");

    if (!yyin) {
        cout<<"yyin is NULL"<<endl;
        return 1;
    }

    yylex();
    return 0;
}

适用于输入：

'''This is
a multi
line
string!'''

This is
some random
text

输出结果为：

GotIt!

This is
some random
text

但是对于此输入不起作用（或者，更准确，产生错误的输出）：

'''This is
a multi
line
string!'''

This is
some random
text

'''and this
is another
multiline
string'''

产生：

GotIt!

这是因为我的规则是：
“扫描三个撇号，然后是任何可能的字符，然后是三个撇号”，
而是应该说：
“扫描三个撇号，然后是任何可能的字符除了三个撇号，然后是三个撇号”。

我该怎么做？

Answer 1

对于这样的简单否定，构造正则表达式相对容易：

"'''"([^']|'[^']|''[^'])*"'''"

Answer 2

似乎支持量程范围{x，y}构造，
所以这很有效，当然比交替更快如果你有大字符串，这是要走的路。

'''[^']*(?:[']{1,2}[^']+)*'''

 '''
 [^']* 
 (?: [']{1,2} [^']+ )*
 '''

基准：交替与非交替

-----------------------------
'''Set 1 - this
is another
multiline
string'''
 Regex_FAST  (?-xism:'''[^']*(?:[']{1,2}[^']+)*''')
    -took: 0.811201 wallclock secs ( 0.81 usr +  0.00 sys =  0.81 CPU)

'''Set 1 - this
is another
multiline
string'''
 Regex_ALT  (?-xism:'''(?:[^']|'[^']|''[^'])*''')
    -took: 1.4971 wallclock secs ( 1.50 usr +  0.00 sys =  1.50 CPU)

-----------------------------
'''Set 2 - this
is' another
mul'tiline
st''ring'''
 Regex_FAST  (?-xism:'''[^']*(?:[']{1,2}[^']+)*''')
    -took: 0.935462 wallclock secs ( 0.94 usr +  0.00 sys =  0.94 CPU)

'''Set 2 - this
is' another
mul'tiline
st''ring'''
 Regex_ALT  (?-xism:'''(?:[^']|'[^']|''[^'])*''')
    -took: 1.85556 wallclock secs ( 1.86 usr +  0.00 sys =  1.86 CPU)

基准代码：

use strict;
use warnings;
use Benchmark ':hireswallclock';

my ($t0,$t1);
my @dataset = (
   "'''Set 1 - this\nis another\nmultiline\nstring'''",
   "'''Set 2 - this\nis' another\nmul'tiline\nst''ring'''" ); 

my $regex_FAST = qr/'''[^']*(?:[']{1,2}[^']+)*'''/;
my $regex_ALT  = qr/'''(?:[^']|'[^']|''[^'])*'''/;

for my $data (@dataset)
{
    print "-----------------------------\n";

  ## 
    while ($data =~ /$regex_FAST/g){ print "$&\n"; };
    $t0 = new Benchmark;
    for my $cnt (1 .. 500_000) {
        while ($data =~ /$regex_FAST/g){ };
    }
    $t1 = new Benchmark;
    print " Regex_FAST  $regex_FAST\n    -took: ", timestr(timediff($t1, $t0)), "\n\n";

  ## 
    while ($data =~ /$regex_ALT/g){ print "$&\n"; };
    $t0 = new Benchmark;
    for my $cnt (1 .. 500_000) {
        while ($data =~ /$regex_ALT/g){ };
    }
    $t1 = new Benchmark;
    print " Regex_ALT  $regex_ALT\n    -took: ", timestr(timediff($t1, $t0)), "\n\n";
}

如何在Flex中编写以下正则表达式？

2 个答案: