Question

这通常是我直接说出来的东西，直到我能说得对，但是在这种情况下我相信它是正则表达式的一部分，我从来没有完全理解过。贪婪与非贪婪的东西。

我有这样的内容：

[quote=mick-mick topic=33586]
I just gave DayZ an hour of my life. I can never get that back. :/
I had to wait to wait. Slow loads just to get to the server selection screen then once I           chose a server it took another almost 3 minutes to get into the server.

I'll still give H1Z1 a shot for sure. :)
[/quote]

This is a test

我试图使用的正则表达式是：

/(\[quote=[a-zA-Z0-9]+\](.*)\[\/quote\])?(.*)/m

但它只与报价行匹配。

如您所见，我需要用户名（mick-mick），主题ID，引用内容以及引用后的内容。此外，报价可能根本不存在于内容中。

你能帮我解决这个问题吗？我错过了什么？我在PHP中使用preg_match。

Answer 1

最终更新：

要匹配多个引号和，请抓取所有内容甚至不在引号中，这有点困难。但是，here goes：

(?:
  \[quote=([a-z0-9\-]+)
  \s*topic=(\d+)\]
  (.*?)
  \[/quote\]
 |
  (.+?)
  (?=\[quote|$)
)

这次我们在所有事物周围使用交替的非捕获组。我们要么匹配报价（与我们的捕获组1,2和3），要么我们将1个以上的其他字符匹配到捕获组4（这是任何不属于报价的部分）。这里的关键补充是积极的前瞻（(?=...)）。这是一个零长度断言（意思是它只是“检查”但不匹配），它查找[quote或跟随它的字符串（$）的结尾。这是为了使我们不仅仅匹配新的引用。

注意：要在PHP中进行全局匹配，您需要使用preg_match_all()。

<强>更新

我更新了此内容以获取引号之前/之后的内容，并使引用可选（通过添加可选的非捕获组：(?:...)?）。我也重新阅读了你的问题并看到所有引号都有一个引用/主题（如果不是这种情况，你需要将这些与表达式结合起来......它就是：

(.*?)(?:\[quote=([a-z0-9\-]+)\s*topic=(\d+)\](.*)\[/quote\])?(.*)

并使用如下：

preg_match('~(.*?)(?:\[quote=([a-z0-9\-]+)\s*topic=(\d+)\](.*)\[/quote\])?(.*)~si', $html, $matches);
$matches[0]; // Full match
$matches[1]; // Before the quote (empty if quote doesn't exist)
$matches[2]; // Quote value: `mick-mick`
$matches[3]; // Topic value: `33586`
$matches[4]; // Quote contents: `I just...`
$matches[5]; // Everything else (entire string is quote doesn't exist)

Demo

你的表达中有一些问题，但它非常接近。这是一个清理版本：

\[quote=([a-z0-9\-]+)\s*(.*?)\](.*)\[/quote\]

你可以像这样使用它：

preg_match('~\[quote=([a-z0-9\-]+)\s*(.*?)\](.*)\[/quote\]~si', $html, $matches);
$matches[0]; // Full match
$matches[1]; // Quote value: `mick-mick`
$matches[2]; // Quote parameters: `topic=33586`
$matches[3]; // Quote contents: `I just...`

Demo

根本问题是你把所有内容都包裹在(...)?中，然后是(.*)。这意味着第一部分是可选的，无法匹配，然后您匹配0+个字符..因为.与新行不匹配（除非您在我的示例中使用s修饰符），你匹配了作为报价的第一行。

此外，当您的引号（quote=[a-zA-Z0-9]+）中有连字符，空格和等号时，您使用了[quote=mick-mick topic=33586]。相反，我使用[a-z0-9\-]（i修饰符用于区分大小写），然后是空格（\s*），然后是懒惰捕获其余参数。

如果您有疑问或想要不同的功能，请告诉我。

RegEx：问题解析论坛帖子正文（带引号）

1 个答案: