我正在尝试在flex中定义一条捕获“多行字符串”的规则
多行字符串是一个以三个撇号开头的字符串:'''
,以三个撇号结尾,并且可以跨越多行。
例如:
'''This is
an example of
a multiline
string'''
所以我的尝试是这样的:
%{
#include<iostream>
using std::cout;
using std::endl;
%}
MULTI_LN_STR '''(.|\n)*'''
%%
{MULTI_LN_STR} {cout<<"GotIt!";}
%%
int main(int argc, char* argv[]) {
yyin=fopen("test.txt", "r");
if (!yyin) {
cout<<"yyin is NULL"<<endl;
return 1;
}
yylex();
return 0;
}
适用于输入:
'''This is
a multi
line
string!'''
This is
some random
text
输出结果为:
GotIt!
This is
some random
text
但是对于此输入不起作用(或者,更准确,产生错误的输出):
'''This is
a multi
line
string!'''
This is
some random
text
'''and this
is another
multiline
string'''
产生:
GotIt!
这是因为我的规则是:
“扫描三个撇号,然后是任何可能的字符,然后是三个撇号”,
而是应该说:
“扫描三个撇号,然后是任何可能的字符除了三个撇号,然后是三个撇号”。
我该怎么做?
答案 0 :(得分:2)
对于这样的简单否定,构造正则表达式相对容易:
"'''"([^']|'[^']|''[^'])*"'''"
答案 1 :(得分:-2)
似乎支持量程范围{x,y}构造,
所以这很有效,当然比交替更快
如果你有大字符串,这是要走的路。
'''[^']*(?:[']{1,2}[^']+)*'''
'''
[^']*
(?: [']{1,2} [^']+ )*
'''
基准:交替与非交替
-----------------------------
'''Set 1 - this
is another
multiline
string'''
Regex_FAST (?-xism:'''[^']*(?:[']{1,2}[^']+)*''')
-took: 0.811201 wallclock secs ( 0.81 usr + 0.00 sys = 0.81 CPU)
'''Set 1 - this
is another
multiline
string'''
Regex_ALT (?-xism:'''(?:[^']|'[^']|''[^'])*''')
-took: 1.4971 wallclock secs ( 1.50 usr + 0.00 sys = 1.50 CPU)
-----------------------------
'''Set 2 - this
is' another
mul'tiline
st''ring'''
Regex_FAST (?-xism:'''[^']*(?:[']{1,2}[^']+)*''')
-took: 0.935462 wallclock secs ( 0.94 usr + 0.00 sys = 0.94 CPU)
'''Set 2 - this
is' another
mul'tiline
st''ring'''
Regex_ALT (?-xism:'''(?:[^']|'[^']|''[^'])*''')
-took: 1.85556 wallclock secs ( 1.86 usr + 0.00 sys = 1.86 CPU)
基准代码:
use strict;
use warnings;
use Benchmark ':hireswallclock';
my ($t0,$t1);
my @dataset = (
"'''Set 1 - this\nis another\nmultiline\nstring'''",
"'''Set 2 - this\nis' another\nmul'tiline\nst''ring'''" );
my $regex_FAST = qr/'''[^']*(?:[']{1,2}[^']+)*'''/;
my $regex_ALT = qr/'''(?:[^']|'[^']|''[^'])*'''/;
for my $data (@dataset)
{
print "-----------------------------\n";
##
while ($data =~ /$regex_FAST/g){ print "$&\n"; };
$t0 = new Benchmark;
for my $cnt (1 .. 500_000) {
while ($data =~ /$regex_FAST/g){ };
}
$t1 = new Benchmark;
print " Regex_FAST $regex_FAST\n -took: ", timestr(timediff($t1, $t0)), "\n\n";
##
while ($data =~ /$regex_ALT/g){ print "$&\n"; };
$t0 = new Benchmark;
for my $cnt (1 .. 500_000) {
while ($data =~ /$regex_ALT/g){ };
}
$t1 = new Benchmark;
print " Regex_ALT $regex_ALT\n -took: ", timestr(timediff($t1, $t0)), "\n\n";
}