正则表达式匹配所有引用的字符串

时间:2013-11-22 22:38:29

标签: regex

我正在尝试执行regexp,如果可能的话,它将匹配文本中所有引用的字符串。 一个例子:

ABC released its full midseason schedule today, and it features premiere dates for several new shows, along with one rather surprising timeslot change.</p><p>First of all, ABC's previously reported plans for dramas 'Once Upon A Time,' 'Revenge,' 'Grey's Anatomy,' and 'Scandal' haven't changed.

我希望得到结果:

's previously reported plans for dramas ' (not useful but i can manage it)
'Once Upon A Time,'
' '
'Revenge,'
' 'Grey'
'Grey's Anatomy,'
etc

所以我基本上需要匹配每个报价两次。如果我使用标准的正则表达式,我就不会有“黄飞鸿”和“灰色解剖”,原因显而易见。

感谢您的任何建议!

1 个答案:

答案 0 :(得分:2)

这是 Perl 中适用于给定示例的解决方案。请参阅live demo

#!/usr/bin/perl -w

use strict;
use warnings;

while (<DATA>) {

#   \1/ Starting at the beginning of a string or non-word character,
#   \2/ MATCH a single-quote character followed by a character that is
#       *not* a single quote character,
#   \3/ And continue matching one or more times:
#       - a white space character,
#       - a word character,
#       - a comma,
#       - or a single-quote that is followed by a lower-case 's' or 't'.
#   \4/ And END the match on a single quote.
#   \5/ Continue searching for additional matches.

    my @matches = /(?:\A|\W)('[^'](?:\w|\s|,|'(?=[st]\b))+')/g;

#                  \___1___/\__2_/\___________3__________/4/\5/

    print join("\n", @matches), "\n";
}

__END__
 'At the Beginning' ABC released its full midseason schedule today, and it features premiere dates for several new shows, along with one rather surprising timeslot change.</p><p>First of all, ABC's previously reported plans for dramas 'Once Upon A Time,' 'Revenge,' 'Grey's Anatomy,' and 'Scandal' haven't changed.

预期输出:

'At the Beginning'
'Once Upon A Time,'
'Revenge,'
'Grey's Anatomy,'
'Scandal'