Question

这是内容篇：

This is content that is a sample.
[md] Special Content Piece [/md]

This is some more content.

我想要的是一个preg_match_all表达式，以便它可以从上面的内容中获取并提供以下内容：

[md] Special Content Piece [/md]

我试过这个：

$pattern ="/\[^[a-zA-Z][0-9\-\_\](.*?)\[\/^[a-zA-Z][0-9\-\_]\]/";
preg_match_all($pattern, $content, $matches);

但它给出了一个空白数组。有人可以帮忙吗？

Answer 1

$pattern = "/\[md\](.*?)\[\md\]/";

通常

$pattern = "/\[[a-zA-Z0-9\-\_]+\](.*?)\[\/[a-zA-Z0-9\-\_]+\]/";

甚至更好

$pattern = "/\[\w+\](.*?)\[\/\w+\]/";

并将开始标记与结束标记匹配：

$pattern = "/\[(\w+)\](.*?)\[\/\1\]/";

（请注意，然后在匹配数组中返回“tag”名称。）

Answer 2

您可以使用：

$pattern = '~\[([^]]++)]\K[^[]++(?=\[/\1])~';

说明：

~          #delimiter of the pattern
\[         #literal opening square bracket (must be escaped)

(          #open the capture group 1
  [^]]++     #all characters that are not ] one or more times
)          #close the capture group 1

]          #literal closing square bracket (no need to escape)

\K         #reset all the match before

[^[]++     #all characters that are not [ one or more times

(?=        #open a lookahead assertion (this doesn't consume characters)
  \[/        #literal opening square bracket and slash
  \1         #back reference to the group 1
  ]          #literal closing square bracket
)          #close the lookhead
~

这种模式的兴趣：

结果是整个匹配，因为我已经在\K之前重置了所有匹配，并且因为前瞻断言，在你寻找之后，不消耗字符而不在匹配中。

字符类以负数定义，因此写入和许可更短（你不关心里面必须包含哪些字符）

模式检查开始和结束标记是否与捕获组\ back reference的系统相同。

限制：

此表达式不处理嵌套结构（您不需要）。如果您需要，请编辑您的问题。

对于嵌套结构，您可以使用：

(?=(\[([^]]++)](?<content>(?>[^][]++|(?1))*)\[/\2]))

如果你的bbcode允许属性：

(?=(\[([^]\s]++)[^]]*+](?<content>(?>[^][]++|(?1))*)\[/\2]))

如果允许自动关闭bbcode标签：

(?=((?:\[([^][]++)](?<content>(?>[^][]++|(?1))*)\[/\2])|\[[^/][^]]*+]))

说明：

换句话说，lookahead表示：“后跟”

我使用possessive quantifiers（++）代替简单的gready量词（+）来通知正则表达式引擎它不需要回溯（获得性能）和atomic groups（即：(?>..)）出于同样的原因。

在嵌套结构的模式中，斜杠不会被转义，要使用它们，您必须选择不是斜杠的分隔符（~，#，`）。

嵌套结构的模式使用递归（即(?1)），您可以获得有关此功能的更多信息here和here。

Answer 3

更新
如果你可能正在使用嵌套的“标签”，我可能会选择这样的东西：

$pattern = '/(\[\s*([^\]]++)\s*\])(?=(.*?)(\[\s*\/\s*\2\s*\]))/';

正如您可能知道的那样，与CasimiretHippolyte建议的不同（只有他的正则表达式，AFAIKT，不会在如下情况中捕获外部标签：）

his is content that is a sample.
[md] Special Content [foo]Piece[/foo] [/md]

This is some more content.

然而，使用此表达式，$matches看起来像：

array (
  0 => 
  array (
    0 => '[md]',
    1 => '[foo]',
  ),
  1 => 
  array (
    0 => '[md]',
    1 => '[foo]',
  ),
  2 => 
  array (
    0 => 'md',
    1 => 'foo',
  ),
  3 => 
  array (
    0 => ' Special Content [foo]Piece[/foo] ',
    1 => 'Piece',
  ),
  4 => 
  array (
    0 => '[/md]',
    1 => '[/foo]',
  ),
)

一个相当简单的模式，匹配所有看起来像[foo]sometext[/foo]

的子串

$pattern = '/(\[[^\/\]]+\])([^\]]+)(\[\s*\/\s*[^\]]+\])/';

if (preg_match_all($pattern, $content, $matches))
{
    echo '<pre>';
    print_r($matches);
    echo '</pre>';
}

输出：

array (
  0 => 
  array (
    0 => '[md] Special Content Piece [/md]',
  ),
  1 => 
  array (
    0 => '[md]',
  ),
  2 => 
  array (
    0 => ' Special Content Piece ',
  ),
  3 => 
  array (
    0 => '[/md]',
  ),
)

这种模式的工作原理：它分为三组。
第一个：(\[[^\/\]]+\])匹配开始和结束[]，中间的所有内容既不是右括号也不是正斜杠。
第二个：'（[^]] +）'匹配第一个不是[的组之后的每个字符第三个：(\[\s*\/\s*[^\]]+\])匹配一个开头[，后跟零个或多个空格，一个正斜杠，再后跟零个或多个空格，以及任何其他不是] <的char / p>

如果要匹配特定的结束标记，但保留相同的三个组（第四个），请使用此（稍微复杂一点）的表达式：

$pattern = '/(\[\s*([^\]]+?)\s*\])(.+?)(\[\s*\/\s*\2\s*\])/';

这将返回：

array (
  0 => 
  array (
    0 => '[md] Special Content Piece [/md]',
  ),
  1 => 
  array (
    0 => '[md]',
  ),
  2 => 
  array (
    0 => 'md',
  ),
  3 => 
  array (
    0 => ' Special Content Piece ',
  ),
  4 => 
  array (
    0 => '[/md]',
  ),
)

请注意，第2组（我们在表达式中使用的那个\2）是“标记名”本身。

从给定字符串中提取一些内容

3 个答案: