Question

我有一个WordPress短代码，以[pullquote]打开，以[/pullquote]结尾。我试图获得开始和结束标签内的任何内容。

我是正则表达式的新手，所以我盯着一个捕捉字母，数字和空格的简单表达。

\[pullquote\]([0-9a-zA-z\s]*)\[\/pullquote\]

工作正常，但没有考虑标点符号等等所以我尝试了(.*)这样做太多而且不够具体。

最后我尝试了这个

\[pullquote\](^(?:\[\/pullquote\])*)\[\/pullquote\]

我不清楚这里的术语，但基本上想要获得以[pullquote]开头的任何内容，只要它不是[/pullquote]并以{{结尾1}}。

至少在regexr.com它没有用，但我认为这意味着我做错了。

regexr上使用的文字

[/pullquote]

我怎样才能完成这项工作？我在这里做了其他任何错事。

由于

Answer 1

你需要这个：

(\[pullquote\])(.+)(\[\/pullquote\])

只获得第2组$2

请在此处查看：https://regex101.com/r/dS8eZ0/2

从链接中提取的信息：

MATCH INFORMATION
"(\[pullquote\])(.+)(\[\/pullquote\])/g"
    1st Capturing group "(\[pullquote\])"
      "\[" matches the character [ literally         
      "pullquote" matches the characters pullquote literally (case sensitive)
      "\]" matches the character ] literally
    2nd Capturing group "(.+)"
     ".+" matches any character (except newline)
       "Quantifier: +" Between one and unlimited times, as many times as possible, 
                       giving back as needed [greedy]
    3rd Capturing group "(\[\/pullquote\])"
      "\[" matches the character [ literally
      "\/" matches the character / literally
      "pullquote" matches the characters pullquote literally (case sensitive)
      "\]" matches the character ] literally
  "g" modifier: global. All matches (don't return on first match)

Answer 2

以下是使用strpos()的基本搜索，您可能会尝试这样做以进行性能比较。

function extract_shortcode_content($needle, $haystack) {
    if(empty($needle) || empty($haystack || !is_string($needle) || !is_string($haystack)) {
        throw new Exception('Bad input');
    }
    // $needle is just intended to be shortcode value (i.e. 'pullquote')
    // we will build appropriate start and end tags
    $needle_trimmed = trim(trim($needle), '[]');
    $start_code = '[' . $needle_trimmed. ']';
    $end_code = '[/' . $needle_trimmed . ']';
    $start_code_length = strlen($start_code);
    $end_code_length = strlen($end_code);
    $haystack_length = strlen($haystack);
    $last_searchable_position = $haystack_length - $start_code_length - $end_code_length - 1;

    $return_array = array();

    // iterate through haystack extracting content
    $search_offset = 0;
    $continue = true;

    while($search_offset < $last_searchable_position) {
        $start_code_found = strpos($haystack, $start_code, $search_offset) {        
        if ($start_code_found === false) {
            // no match in remainder of string
            return $return_array;
        }

        // extract content
        $content_start_position = $code_found + $start_code_length;
        $end_code found = strpos($haystack, $start_code, $content_start_position);
        if ($end_code_found === false) {
            // we couldn't find close for current shortcode open tag.
            // we don't count this as a match, so let's just return matches we have
            return $return_array;
        }
        $match_length = $end_close_found - $content_start_position;
        // add content to result array
        $result_array[] = substr($haystack, $content_start_position, $match_length);
        // set new search offset position for next iteration
        $search_offset = $end_code_found + $end_code_length;
    }

    return $return_array;
}

现在，我并不是说你应该使用它而不是正则表达式方法。当然，正则表达式方法可以在几行代码中得到相同的结果。我只是建议这种方法可能比这个用例的正则表达式更好。然而，这可能是针对您的用例的微优化，并且不值得额外的代码复杂性。

我只是想为正则表达式提供另一种建议。

正则表达式从WordPress短代码中提取内部内容

2 个答案: