正则表达式从WordPress短代码中提取内部内容

时间:2015-12-15 17:57:38

标签: regex wordpress

我有一个WordPress短代码,以[pullquote]打开,以[/pullquote]结尾。我试图获得开始和结束标签内的任何内容。

我是正则表达式的新手,所以我盯着一个捕捉字母,数字和空格的简单表达。

\[pullquote\]([0-9a-zA-z\s]*)\[\/pullquote\]

工作正常,但没有考虑标点符号等等所以我尝试了(.*)这样做太多而且不够具体。

最后我尝试了这个

\[pullquote\](^(?:\[\/pullquote\])*)\[\/pullquote\]

我不清楚这里的术语,但基本上想要获得以[pullquote]开头的任何内容,只要它不是[/pullquote]并以{{结尾1}}。

至少在regexr.com它没有用,但我认为这意味着我做错了。

regexr上使用的文字

[/pullquote]

我怎样才能完成这项工作?我在这里做了其他任何错事。

由于

2 个答案:

答案 0 :(得分:1)

你需要这个:

(\[pullquote\])(.+)(\[\/pullquote\])

只获得第2组$2

请在此处查看:https://regex101.com/r/dS8eZ0/2

从链接中提取的信息:

MATCH INFORMATION
"(\[pullquote\])(.+)(\[\/pullquote\])/g"
    1st Capturing group "(\[pullquote\])"
      "\[" matches the character [ literally         
      "pullquote" matches the characters pullquote literally (case sensitive)
      "\]" matches the character ] literally
    2nd Capturing group "(.+)"
     ".+" matches any character (except newline)
       "Quantifier: +" Between one and unlimited times, as many times as possible, 
                       giving back as needed [greedy]
    3rd Capturing group "(\[\/pullquote\])"
      "\[" matches the character [ literally
      "\/" matches the character / literally
      "pullquote" matches the characters pullquote literally (case sensitive)
      "\]" matches the character ] literally
  "g" modifier: global. All matches (don't return on first match)

答案 1 :(得分:1)

以下是使用strpos()的基本搜索,您可能会尝试这样做以进行性能比较。

function extract_shortcode_content($needle, $haystack) {
    if(empty($needle) || empty($haystack || !is_string($needle) || !is_string($haystack)) {
        throw new Exception('Bad input');
    }
    // $needle is just intended to be shortcode value (i.e. 'pullquote')
    // we will build appropriate start and end tags
    $needle_trimmed = trim(trim($needle), '[]');
    $start_code = '[' . $needle_trimmed. ']';
    $end_code = '[/' . $needle_trimmed . ']';
    $start_code_length = strlen($start_code);
    $end_code_length = strlen($end_code);
    $haystack_length = strlen($haystack);
    $last_searchable_position = $haystack_length - $start_code_length - $end_code_length - 1;

    $return_array = array();

    // iterate through haystack extracting content
    $search_offset = 0;
    $continue = true;

    while($search_offset < $last_searchable_position) {
        $start_code_found = strpos($haystack, $start_code, $search_offset) {        
        if ($start_code_found === false) {
            // no match in remainder of string
            return $return_array;
        }

        // extract content
        $content_start_position = $code_found + $start_code_length;
        $end_code found = strpos($haystack, $start_code, $content_start_position);
        if ($end_code_found === false) {
            // we couldn't find close for current shortcode open tag.
            // we don't count this as a match, so let's just return matches we have
            return $return_array;
        }
        $match_length = $end_close_found - $content_start_position;
        // add content to result array
        $result_array[] = substr($haystack, $content_start_position, $match_length);
        // set new search offset position for next iteration
        $search_offset = $end_code_found + $end_code_length;
    }

    return $return_array;
}

现在,我并不是说你应该使用它而不是正则表达式方法。当然,正则表达式方法可以在几行代码中得到相同的结果。我只是建议这种方法可能比这个用例的正则表达式更好。然而,这可能是针对您的用例的微优化,并且不值得额外的代码复杂性。

我只是想为正则表达式提供另一种建议。