PHP - preg_match / preg_replace问题

时间:2013-06-09 01:34:33

标签: php regex preg-replace preg-match

我对preg_match和preg_replace有点困惑。我有一个很长的内容字符串(来自博客),我想找到,分开并替换所有[caption]标签。可能的标签可以是:

[caption]test[/caption]
[caption align="center" caption="test" width="123"]<img src="...">[/caption]
[caption caption="test" align="center" width="123"]<img src="...">[/caption]

这是我的代码(但我发现它不按照我想要的方式工作......):

public function parse_captions($content) {
    if(preg_match("/\[caption(.*) align=\"(.*)\" width=\"(.*)\" caption=\"(.*)\"\](.*)\[\/caption\]/", $content, $c)) {
        $caption = $c[4];         
        $code = "<div>Test<p class='caption-text'>" . $caption . "</p></div>";
        // Here, I'd like to ONLY replace what was found above (since there can be
        // multiple instances
        $content = preg_replace("/\[caption(.*) width=\"(.*)\" caption=\"(.*)\"\](.*)\[\/caption\]/", $code, $content);
    }
    return $content;
}

2 个答案:

答案 0 :(得分:1)

目标是忽略内容位置。你可以试试这个:

$subject = <<<'LOD'
[caption]test1[/caption]
[caption align="center" caption="test2" width="123"][/caption]
[caption caption="test3" align="center" width="123"][/caption]
LOD;

$pattern = <<<'LOD'
~
\[caption                          # begining of the tag 
(?>[^]c]++|c(?!aption\b))*         # followed by anything but c and ]
                                   # or c not followed by "aption"

(?|                                # alternation group
    caption="([^"]++)"[^]]*+]      # the content is inside the begining tag  
  |                                # OR
    ]([^[]+)                       # outside 
)                                  # end of alternation group

\[/caption]                        # closing tag
~x
LOD;

$replacement = "<div>Test<p class='caption-text'>$1</p></div>";

echo htmlspecialchars(preg_replace($pattern, $replacement, $subject));

模式(精简版):

$pattern = '~\[caption(?>[^]c]++|c(?!aption\b))*(?|caption="([^"]++)"[^]]*+]|]([^[]++))\[/caption]~';

模式说明:

开始标记之后,您可以拥有]之前的内容或标题属性。该内容用以下内容描述:

(?>                # atomic group
    [^]c]++        # all characters that are not ] or c, 1 or more times
  |                # OR
    c(?!aption\b)  # c not followed by aption (to avoid the caption attribute)
)*                 # zero or more times

交替组(?|允许多个具有相同编号的捕获组:

(?|
       # case: the target is in the caption attribute #
    caption="      # (you can replace it by caption\s*+=\s*+")
    ([^"]++)       # all that is not a " one or more times (capture group)
    "
    [^]]*+         # all that is not a ] zero or more times

  |           # OR

       # case: the target is outside the opening tag #
    ]              # square bracket close the opening tag
    ([^[]+)        # all that is not a [ 1 or more times (capture group)
)

这两个捕获现在具有相同的数字#1

注意:如果您确定每个标题标签不在多行上,则可以在模式的末尾添加m修饰符。

注意2:所有量词都是possessive,并且当快速失败和更好的表现时,我会使用atomic groups

答案 1 :(得分:0)

提示(而不是答案,本身)

你最好的行动方法是:

  1. 匹配caption之后的所有内容。

    preg_match("#\[caption(.*?)\]#", $q, $match)
    
  2. 使用爆炸功能提取$match[1]中的值(如果有)。

    explode(' ', trim($match[1]))
    
  3. 检查返回的数组中的值,并相应地使用您的代码。