Question

假设有两个默认的HTML电子邮件标签：

<a href="mailto:test@test.com">test@test.com</a>
<a href="mailto:test@test.com" nosecure>test@test.com</a>

我想在PHP中找到只有没有 nosecure标记的电子邮件标记。所以像\<a\b(?![^>]*\bnosecure\b)[^>]*>[^<]*<\/a>这样的东西到目前为止都可以做到但是现在我想为href标记的值设置一个组，为<a>...</a>标记内的文本设置一个组。第二组很容易：

\<a\b(?![^>]*\bnosecure\b)[^>]*>([^<]*)<\/a>

但我如何获得第一组？在href标签之后/之前可以有无限制的其他字符，并且鼻子固定可以在href标签之后/之前。
如何获得href="mailto:<group>"值的正则表达式组。此外，可以'代替"。

测试用例和我当前的尝试：https://regex101.com/r/RNEZO3/2

感谢您的帮助:) 问候

Answer 1

Never使用正则表达式来解析HTML。始终使用a DOM parser！这比你想象的要容易，只需要学习一点XPath到find the attribute（或缺少它）和文本内容。

<?php
$html = <<< HTML
<div>
<a href="mailto:test@test.com">test@test.com</a>
<a href="mailto:test@test.com" nosecure>test@test.com</a>
</div>
HTML;
$dom = new DomDocument();
$dom->loadHTML($html);
$xpath = new DomXPath($dom);

/* href attribute */
$result = $xpath->query("//a[not(@nosecure)]/@href");
foreach ($result as $node) {
    echo str_replace("mailto:", "", $node->value);
}

/* text content */
$result = $xpath->query("//a[not(@nosecure)]/text()");
foreach ($result as $node) {
    echo $node->textContent;
}

正则表达式以保护电子邮件地址

1 个答案: