我正在尝试执行regexp,如果可能的话,它将匹配文本中所有引用的字符串。 一个例子:
ABC released its full midseason schedule today, and it features premiere dates for several new shows, along with one rather surprising timeslot change.</p><p>First of all, ABC's previously reported plans for dramas 'Once Upon A Time,' 'Revenge,' 'Grey's Anatomy,' and 'Scandal' haven't changed.
我希望得到结果:
's previously reported plans for dramas ' (not useful but i can manage it)
'Once Upon A Time,'
' '
'Revenge,'
' 'Grey'
'Grey's Anatomy,'
etc
所以我基本上需要匹配每个报价两次。如果我使用标准的正则表达式,我就不会有“黄飞鸿”和“灰色解剖”,原因显而易见。
感谢您的任何建议!
答案 0 :(得分:2)
这是 Perl 中适用于给定示例的解决方案。请参阅live demo。
#!/usr/bin/perl -w
use strict;
use warnings;
while (<DATA>) {
# \1/ Starting at the beginning of a string or non-word character,
# \2/ MATCH a single-quote character followed by a character that is
# *not* a single quote character,
# \3/ And continue matching one or more times:
# - a white space character,
# - a word character,
# - a comma,
# - or a single-quote that is followed by a lower-case 's' or 't'.
# \4/ And END the match on a single quote.
# \5/ Continue searching for additional matches.
my @matches = /(?:\A|\W)('[^'](?:\w|\s|,|'(?=[st]\b))+')/g;
# \___1___/\__2_/\___________3__________/4/\5/
print join("\n", @matches), "\n";
}
__END__
'At the Beginning' ABC released its full midseason schedule today, and it features premiere dates for several new shows, along with one rather surprising timeslot change.</p><p>First of all, ABC's previously reported plans for dramas 'Once Upon A Time,' 'Revenge,' 'Grey's Anatomy,' and 'Scandal' haven't changed.
预期输出:
'At the Beginning'
'Once Upon A Time,'
'Revenge,'
'Grey's Anatomy,'
'Scandal'