从字符串中提取引用

时间:2013-07-11 21:42:36

标签: php string-matching double-quotes

在PHP中,给出了一段很长的文本,例如:

  

女士。去年当选司法部长的凯恩已经去过   提到作为未来可能的州长候选人,打了一个   她向受欢迎的观众发布的简短声明中的政治说明   并为她的决定鼓掌。

     “我以这种方式看着它,州长将成为O.K.,”她说。   她补充道,她想知道谁会代表“戴夫和罗比,   谁代表艾米丽斯和阿米斯?“

     

“作为司法部长,”她说,“我选择了你。”

我想提取所有引用的材料,在这种情况下是包含这些结果的数组:

"I looked at it this way, the governor’s going to be O.K.,"
"the Daves and Robbies, who represents the Emilys and Amys?"
"As attorney general,"
"I choose you."

假设:

  • 总会有一个匹配的开头&关闭报价
  • 简单的双引号

如果您还确保它处理卷曲引号,单引号和其他特殊情况,则可以获得奖励积分,但如果这样做更容易,可以继续使用普通双引号。

是的 - 我已经在网站上搜索了答案,虽然有些东西看起来很有帮助但我没有找到任何有用的东西。最接近的是这个,但没有骰子:

preg_match_all('/"([^"]*(?:\\"[^"]*)*)"/', $content, $matches)

5 个答案:

答案 0 :(得分:1)

可以尝试PHP split string.

伪代码:

将所有内容拆分为一个数组,其中“作为拆分参数,然后使用%(模数2)仅选择字符串数组中的”中间“文本。要阻止curlies等,只需将所有实例转换为直引号

答案 1 :(得分:1)

$string = 'Ms. Kane, who was elected attorney general last year and has been mentioned as a possible future candidate for governor, struck a political note in her brief announcement to an audience that cheered and applauded her decision.

“I looked at it this way, the governor’s going to be O.K.,” she said. She wondered, she added, who would represent “the Daves and Robbies, who represents the Emilys and Amys?”

“As attorney general,” she said, “I choose you.”';

// Normalize quotes
$search = array("\xe2\x80\x9c", "\xe2\x80\x9d", "\xe2\x80\x98", "\xe2\x80\x99"); 
$replace = array('"', '"', "'", "'");
$newstring = str_replace($search, $replace, $string);

// Extract text
$regex = "/\"(.*)\"/U";  
preg_match_all ($regex, $newstring, $output);  

if(isset($output[1])) {
    print_r($output[1]);
} else {
    echo $newstring;
}

应该给予

Array
(
    [0] => I looked at it this way, the governor's going to be O.K.,
    [1] => the Daves and Robbies, who represents the Emilys and Amys?
    [2] => As attorney general,
    [3] => I choose you.
)

答案 2 :(得分:1)

你可以使用这个....

$matches = array();
preg_match_all('/(\“.*\”)/U', str_replace("\n", " ", $str), $matches);
print_r($matches);

注意我正在移除换行符,因此它会在一行上开始引用匹配,在另一行上结束。

答案 3 :(得分:1)

一种最简单的方法,但没有最好的方法是找到“with strstr()并在使用substr()之后剪切字符串。

$string = 'Your long text "with quotation"';

$occur = strpos($string, '"'); // the frst occurence of "
$occur2 = strpos($string, '"', $occur + 1); // second occurence of "

$start = $occur; // the start for cut text
$lenght = $occur2 - $occur + 1; // lenght of all quoted text for cut

$res = substr($string, $start, $lenght); // Your quoted text here ex: "with quotation"

您可以将其插入循环以获取多个引用文本:

   $string = 'Your long text "with quotation" Another long text "and text with quotation"';

    $occur2 = 0; // for doing the first search from begin
    $resString = ''; // if you wont string and not array
    $res = array();
    $end = strripos($string, '"'); // find the last occurence for exit loop

    while(true){
        $occur = strpos($string, '"', $occur2); // after $occur2 change his value for find next occur
        $occur2 = strpos($string, '"', $occur + 1);

        $start = $occur;
        $lenght = $occur2 - $occur + 1;

        $res[] = substr($string, $start, $lenght); // $res may be array
        $resString .= substr($string, $start, $lenght); // or string with concat

        if($end == $occur2)
            break; // brak if is the last occurence

        $occur2++; // increment for search next
    }


    echo $resString .'<br>';
    exit(print_r($res));

结果:

 "with quotation""and text with quotation"
 or
 Array ( [0] => "with quotation" [1] => "and text with quotation" )

它没有使用正则表达式的简单方法,希望有人帮助:) (抱歉英语不好)

答案 4 :(得分:1)

你可以这样做:

<meta charset="UTF-8" />
<pre>
<?php
$pattern = '~(?|"((?>[^"]++|(?<=\\")")*)"|“((?>[^”]++|(?<=\\”)”)*)”)~u';

$text = <<<LOD
Ms. Kane, who was elected attorney general last year and has been mentioned as a possible future candidate for governor, struck a political note in her brief announcement to an audience that cheered and applauded her decision.

“I looked at it this way, the governor’s going to be O.K.,” she said. She wondered, she added, who would represent “the Daves and Robbies, who represents the Emilys and Amys?”

“As attorney general,” she said, “I choose you.”
LOD;

preg_match_all ($pattern, $text, $matches);
print_r($matches[1]);

由于您使用unicode字符,因此必须在模式的末尾添加u修饰符。

您可以使用相同的方式轻松添加您想要的图案,例如使用简单的引号:

$pattern = '~(?|"((?>[^"]++|(?<=\\")")*)"|“((?>[^”]++|(?<=\\”)”)*)”|\'((?>[^\']++|(?<=\\\')\')*)\')~u';

请注意,结构始终相同:

(?|
    "((?>[^"]++|(?<=\\")")*)"
  |
    “((?>[^”]++|(?<=\\”)”)*)”
  |
    \'((?>[^\']++|(?<=\\\')\')*)\'
)