Question

我想从给定字符串中删除使用php regex的锚标记，如果它不在另一个标记内。

输入：

Hi Hello <a href="#">World</a>. This is <div class="some">testing <a href="#">content</a>. some more content</div>

输出：

Hi Hello. This is <div class="some">testing <a href="#">content</a>. some more content</div>

提前致谢。

Answer 1

这样的事情：

$string = 'replace <a href="x">A</a> but not <div> <a>B</a> in tag </div> but also <a>C</a><div></div>';

echo preg_replace('/<a[^>]*?>([^<]*)<\/a>(?![^<]*<\/)/gi', '', $string);

负向前瞻确保锚标记后面没有</ 因此它没有被另一个标签包围。

标记的内容位于捕获组1中，您希望替换为'\1'而不是''。

如果它是关于div标签的，那么这个将忽略div：

echo preg_replace('/<div.*?>.*?<\/div>\K|<a[^>]*?>([^<]*)<\/a>/gi', '\1', $string);

Answer 2

我认为这不是正则表达式的工作，但也尝试使用common trick和(*SKIP)(*FAIL)

'~(<(?!a\b)(\w+)\b(?>(?:(?!</?\2\b).)+(?1)?)*</\2>)(*SKIP)(*F)|<a\b.*?</a>\s*~si'

(*SKIP)(*F)之前的第一部分匹配并跳过not <a recursively的所有标记。
管道|之后的第二部分是最后与可选whitespace匹配的部分。
Flags使用：s（PCRE_DOTALL），i（PCRE_CASELESS）

Try pattern at regex101或查看eval.in for PHP Demo

使用DOMDocument或other parser可能有更好的解决方案。

正则表达式跳过标记，如果它在另一个标记内

2 个答案: