我想在PHP中找到一种方法从字符串中删除一些标签。 我有这个字符串:
hello
<div class="test-1 safe">Hi everybody</div>
<div>Hello world</div>
<p>Hi guys, this is a text</p>
<div class="test">this is another text</div>
我正在尝试编写一个方法来删除字符串中的所有div标记,除了那些具有“安全”类+删除安全类的字符串。 例如,我想在这种情况下输出:
hello
<div class="test-1">Hi everybody</div>
Hello world
<p>Hi guys, this is a text</p>
this is another text
我从reg_ex开始:
public static function clean_text($text, $parent = '')
{
$cleanText = preg_replace("/<\/?div[^>]*\>/i", "", $cleanText);
return $cleanText;
}
但它删除了所有div。 然后,我转移到DomDocument,但我仍然有问题(插入了doctype和编码问题等html标签)。
public static function clean_text($text, $parent = '')
{
//some unnecessary code before...
$cleanText = $text;
//parsing DOM
$dom = new \DOMDocument();
$dom->loadHTML($cleanText);
$divs = $dom->getElementsByTagName('div');
$i = $divs->length - 1;
while ($i > -1) {
$div = $divs->item($i);
if ($div->hasAttribute('class') && strstr($div->getAttribute('class'), 'safe'))
{
$class = $div->getAttribute('class');
$class = str_replace('safe','',$class);
$div->removeAttribute('class');
$div->setAttribute('class',$class);
}
else
{
$txt = $div->nodeValue;
$newelement = $dom->createTextNode($txt);
$div->parentNode->replaceChild($newelement, $div);
}
$i--;
}
$text = $dom->saveHTML();
return $text;
}
有最简单的方法吗?
非常感谢你的帮助。
答案 0 :(得分:0)
你可以用negative lookahead:
来做到这一点$pattern = array(
// replace divs not followed by class ... safe
'~<div(?![^>]*class="[^"]+ safe")[^>]*>(.*?)</div>~s',
// then remove safe
'~(<div[^>]+class="[^"]+) safe"~s');
$replace = array('\1', '\1"');
$str = preg_replace($pattern, $replace, $str);
echo "<pre>".htmlspecialchars($str)."</pre>";
输出:
hello
<div class="test-1">Hi everybody</div>
Hello world
<p>Hi guys, this is a text</p>
this is another text