Question

我一直在研究这个问题。

我有这个字符串（h2标签之前和之后有更多内容）：

...<h2 style='line-height: 44px;'><p>Lorem Ipsum</p></h2>...

我使用什么正则表达式删除所有＆lt; p＆gt;和＆lt; / p＆gt;这些标题标签内的标签？

我正在尝试做这样的事情，但积极的看法背后是不行的：

// for the starting <p> tag
$str = preg_replace('/(?<=<h[1-6]{1}[^>]+>)\s*<p>/i', '', $str);
// for the ending </p> tag
$str = preg_replace('/<\/p>\s*(?=<\/h[1-6]{1}>\s*)/i', '', $str);

这不会考虑＆lt; h2＆gt;内文本深处的段落标记。标签也

[更新]

这是源于PeeHaa建议的链接之一

// for the starting <p> tag
$str = preg_replace("#(<h[1-6].*?>)<p.*?>#", '$1', $str);
// for the ending </p> tag
$str = preg_replace("#<\/p>(<\/h[1-6]>)#", '$1', $str);

Answer 1

你不应该尝试使用正则表达式解析html，虽然已经说过，因为这是html的一个子集而不是完整的文档/嵌套布局，所以有可能：

preg_replace('/(<h([1-6])[^>]*>)\s?<p>(.*)?<\/p>\s?(<\/h\2>)/', "$1$3$4")

此处的测试案例：

http://codepad.org/oA2rtNP9

Answer 2

PHP Parse HTML code

Parse Website for URLs

php - parse html page

还有很多其他人（我本可以增加100多个）。

基本上就是：

请勿尝试使用正则表达式解析HTML。 HTML不是常规语言。

为此使用HTML解析器。

例如：http://php.net/manual/en/book.dom.php

PHP删除标题标记内的所有段落标记

2 个答案: