Question

我想使用正则表达式来识别字符串中带引号的引号。我还想包括双引号和单引号。

示例，如果我有一个字符串：

The "cat and the hat" sat on a rat.  The 'mouse ran' up the clock.

然后它会识别以下内容：

cat and the hat
mouse ran

正则表达式会是什么？

Answer 1

(["']).*?\1

适合我。假设引号内不能存在引号......

Answer 2

#!/usr/bin/env perl
use 5.010;

my $quoted_rx = qr{
    (?<quote> ['"] )  # SO highlight bug "'
    (?<guts> 
       (?: (?! \k<quote> ) . ) *
    )
    \k<quote>
}sx;

my $string = <<'END_OF_STRING';
The "cat and the hat" sat on a rat.  The 'mouse ran' up the clock.
END_OF_STRING

while ($string =~ /$quoted_regex/g) {
     say $+{guts};
}

每次匹配时，引号类型都会显示在$+{quote}中，而它们之间的内容将位于$+{guts}中。

仅适用于U + 27（APOSTROPHE）和U + 22（QUOTATION MARK）。如果你想让它适用于'this'和“this”之类的东西，你必须要发挥得更好。任何类型的引号都有\p{Quotation_Mark}属性，初始标点符号为\p{Pi}，最终标点符号为\p{Pf}。

Answer 3

$s = 'The "cat and the hat" sat on a rat.  The \'mouse ran\' up the clock.';
preg_match_all('~([\'"])(.*?)\1~s', $s, $result);
print_r($result[2]);

输出（如ideone所示）：

Array
(
    [0] => cat and the hat
    [1] => mouse ran
)

preg_match_all将所有匹配结果保存在数组数组中。您可以更改结果的排列方式，但默认情况下，第一个数组包含整体匹配项（$0或$&），第二个数组包含第一个捕获组的内容（$1 ，$2等），等等。

在这种情况下，$result[0]是来自所有匹配的完整引用字符串，$result[1]是引号，而$result[2]是引号之间的任何内容。

正则表达式拉出引用文本

3 个答案: