在我的php脚本中,变量有以下html。
<div>
first line starting text <span class='highlight blink'> first line middlte text1 </span> first line end text.
second line starting text <span class="target"> second line middlte text2 </span> second line end text
<div class="highlight blink"> third line text</div>
</div>
我想删除带有突出显示类的标记,因此上面的html看起来像这样(仅使用正则表达式)
<div>
first line starting text first line middlte text1 first line end text.
second line starting text <span class="target"> second line middlte text2 </span> second line end text
third line text
</div>
我试过这个,但它没有替换有多个类的div标签(见第三行,必须删除div标签)
$data = preg_replace('#<(\w+) class=["\']highlight["\']>(.*)<\/\1>#', '\2', $data);
我试过这个,但它用类替换整个标记。 (参见第二行,带目标类的span标记应保持不变)
$data = preg_replace('#<(\w+) class=["\'](\w+)["\']>(.*)<\/\1>#', '\2', $data);
任何人都可以提前帮助thanx,我正在尝试2天
答案 0 :(得分:2)
不使用正则表达式怎么样?
<?php
// you HTML string
$string = <<<HTML
<div>
first line starting text <span class='highlight blink'> first line middlte text1 </span> first line end text.
second line starting text <span class="target"> second line middlte text2 </span> second line end text
<div class="highlight blink"> third line text</div>
</div>
HTML;
// classname
$classname = 'highlight';
$doc = new DOMDocument();
// load HTML and remove doctype, html, body tags in PHP >= 5.4.0
$doc->loadHTML($string, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
// load HTML and remove doctype, html, body tags in PHP < 5.4.0
/*
$doc->loadHTML($string);
$doc->removeChild($doc->doctype);
$doc->replaceChild($doc->firstChild->firstChild->firstChild, $doc->firstChild);
*/
$finder = new DOMXPath($doc);
/** @var DOMNodeList $nodes */
$nodes = $finder->query("//*[contains(concat(' ', normalize-space(@class), ' '), ' $classname ')]");
/** @var DOMElement $node */
foreach ($nodes as $node) {
/** @var DOMElement $parent */
$parent = $node->parentNode;
/** @var DOMText $child */
$child = $doc->createTextNode(trim($node->nodeValue));
$parent->insertBefore($child, $node);
$parent->removeChild($node);
}
var_dump($doc->saveHTML());
答案 1 :(得分:1)
可以使用正则表达式完成(但是在可以安全的情况下查看我的previous answers之一)。
尽管如此,这个特殊情况非常困难,因为您必须考虑标签的所有可能配置,并且最终可能会匹配您不想要的内容。我强烈建议您使用类似于建议的here
的HTML解析器无论如何,尝试尽可能通用且安全的可能解决方案可以是:
$data = "<div>
first line starting text <span class='highlight blink'> first line <b>middlte</b> text1 </span> first line end text.
second line starting text <span class='target'> second line middlte text2 </span> second line end text
<div class='highlight blink'> third line text</div>
</div>";
$data = preg_replace(
'/<(\w+).*[^>]+class=["\'][^"\']*highlight[^"\']*["\'][^>]*>(.*?)<\/\1>/',
'$2',
$data );
echo( $data );
适用于文本class
内包含highlight
属性的每个代码,以及外部代码中有嵌套代码的内容,例如<div class='highlight'>Something <b>else</b></div>
正则表达式:
class="..."
class="..."
>
周围的字符是可能的
class="..."
内,我们会查找可能被其他字词包围的单词highlight