Question

例如，我有HTML：

<strong>this one</strong> <span>test one</span>
<strong>this two</strong> <span>test two</span>
<strong>this three</strong> <span>test three</span>

如何使用正则表达式使所有文本变得强大并且跨越？

Answer 1

使用DOM和从不使用正则表达式来解析HTML。

$dom = new DOMDocument;
$dom->loadHTML($html);
foreach ($dom->getElementsByTagName('strong') as $tag) {
   echo $tag->nodeValue."<br>";
  }
foreach ($dom->getElementsByTagName('span') as $tag) {
    echo $tag->nodeValue."<br>";
}

<强> OUTPUT :

this one
this two
this three
test one
test two
test three

Demo

为什么我要使用正则表达式来解析HTML内容？

HTML不是常规语言，因此无法通过常规语言进行解析表达式。正则表达式查询无法将HTML分解为它有意义的部分。这么多次，但它没有找到我。甚至 Perl使用的增强的不规则正则表达式不符合解析HTML的任务。

那篇文章来自我们的 Jeff Atwood 。阅读更多here.

Answer 2

使用DOMDocument加载HTML字符串，然后使用XPath表达式获取所需的值：

$dom = new DOMDocument;
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);

foreach ($xpath->query('//strong | //span') as $node) {
    echo $node->nodeValue, PHP_EOL;
}

输出：

this one
test one
this two
test two
this three
test three

Demo

Answer 3

您可以使用捕获的群组。以下是一些例子：

<strong>([^\<]*)<\/strong>

演示：http://regex101.com/r/sK5uF2

和

<span>([^\<]*)<\/span>

演示：http://regex101.com/r/vJ2kP3

在每个中，第一个捕获的组是您的文字：\1或$1

使用正则表达式获取html标记内的所有文本？

3 个答案:

为什么我要使用正则表达式来解析HTML内容？