Question

如何剥离<h1>including this content</h1>

我知道你可以使用条带标签来删除标签，但我希望它们之间的所有内容都消失了。

任何帮助将不胜感激。

Answer 1

在处理HTML时，您应该使用HTML解析器来正确处理它。您可以使用PHP DOMDocument并使用DOMXPath查询元素，例如：

$doc = new DOMDocument();
$doc->loadHTML($html);
$xpath = new DOMXPath($doc);
foreach ($xpath->query('//h1') as $node) {
    $node->parentNode->removeChild($node);
}
$html = $doc->saveHTML();

Answer 2

试试这个：

preg_replace('/<h1[^>]*>([\s\S]*?)<\/h1[^>]*>/', '', '<h1>including this content</h1>');

示例：

echo preg_replace('/<h1[^>]*>([\s\S]*?)<\/h1[^>]*>/', '', 'Hello<h1>including this content</h1> There !!');

<强>输出：

Hello There

Answer 3

如果要删除所有标记并包含内容：

$yourString = 'Hello <div>Planet</div> Earth. This is some <span class="foo">sample</span> content!';
$regex = '/<[^>]*>[^<]*<[^>]*>/';
echo preg_replace($regex, '', $yourString);
#=> Hello  Earth. This is some  content!

HTML属性可以包含<或>。所以，如果你的HTML太乱了，这个方法将不起作用，你需要一个DOM解析器。

正则表达式解释

NODE                     EXPLANATION
--------------------------------------------------------------------------------
  <                        '<'
--------------------------------------------------------------------------------
  [^>]*                    any character except: '>' (0 or more times
                           (matching the most amount possible))
--------------------------------------------------------------------------------
  >                        '>'
--------------------------------------------------------------------------------
  [^<]*                    any character except: '<' (0 or more times
                           (matching the most amount possible))
--------------------------------------------------------------------------------
  <                        '<'
--------------------------------------------------------------------------------
  [^>]*                    any character except: '>' (0 or more times
                           (matching the most amount possible))
--------------------------------------------------------------------------------
  >                        '>'

Answer 4

您可以使用XSLT样式表并将所有标记与自身匹配，除了与空字符串匹配的h1标记，然后将其应用于您的文档。可能有点太重了，不能做这么简单的事情。

Answer 5

您还可以使用strip_tags删除标签以及其间的所有内容..

$ html包含你要删除标签的html或php。

用strip_tags（$ HTML， “”）;

尝试这个我认为这对你有用。

剥离标签及其间的所有内容

5 个答案:

正则表达式解释