我有一个HTML字符串,我想从中删除所有类别为“toremove”的DIV。
使用jQuery等在客户端执行这一点很简单,但我想在服务器端使用PHP进行此操作。
简单的规则表达式不起作用,因为div可能是嵌套的......
答案 0 :(得分:2)
您可以使用DOM对象和xPath删除DIV。
/** UNTESTED **/
$doc = new DOMDocument();
$doc->loadHTMLFile($file);
$xpath = new DOMXpath($doc);
$elements = $xpath->query("*/div[@class='yourClasshere']");
foreach($elements as $e){
$doc->removeChild($e);
}
$doc->saveHTMLFile($file);
您可以替换文件中的加载并保存到带有加载的文件,如果您愿意,可以保存为字符串。
答案 1 :(得分:1)
以下是我用于从网页中删除内容的一段代码:
/**
* A method to remove unwanted parts of an HTML-page. Can remove elements by
* id, tag name and/or class names.
*
* @param string $html The HTML to manipulate
* @param array $partsToRemove An array of arrays, with the keys specifying
* what type of values the array holds. The following keys are used:
* 'elements' - An array of element ids to remove from the html
* 'tags' - An array of tag names to remove from the html
* 'classNames' - An array of class names. Each tag that contains one of the
* class names will be removed from the html.
*
* Also, note that descendants of the removed document will also be removed.
*
* @return string The manipulated HTML content
*
* @example removeHtmlParts($html, array (
* 'elements' => array ('headerSection', 'nav', 'footerSection'),
* 'tags' => array ('form'),
* 'classNames' => array ('promotion')
* ));
*/
-
public function removeHtmlParts ($html, array $toRemove = array())
{
$document = new \DOMDocument('1.0', 'UTF-8');
$document->encoding = 'UTF-8';
// Hack to force DOMDocument to load the HTML using UTF-8.
@$document->loadHTML('<?xml encoding="UTF-8">' . $response->getBody());
$partsToRemove = array ();
if(isset($toRemove['elements']))
{
$partsToRemove['elements'] = $toRemove['element'];
}
if(isset($toRemove['tags']))
{
$partsToRemove['tags'] = $toRemove['tags'];
}
if(isset($toRemove['classNames']))
{
$partsToRemove['classNames'] = $toRemove['classNames'];
}
foreach ($partsToRemove as $type => $content)
{
if($type == 'elements')
{
foreach ($content as $elementId)
{
$element = $document->getElementById($elementId);
if($element)
{
$element->parentNode->removeChild($element);
}
}
}
elseif($type == 'tags')
{
foreach($content as $tagName)
{
$tags = $document->getElementsByTagName($tagName);
while($tags->length)
{
$tag = $tags->item(0);
if($tag)
{
$tag->parentNode->removeChild($tag);
}
}
}
}
elseif($type == 'classNames')
{
foreach ($content as $className)
{
$xpath = new \DOMXPath($document);
$xpathExpression = sprintf(
'//*[contains(@class,"%1")]',
$className
);
$domNodeList = $xpath->evaluate($xpathExpression);
for($i = 0; $i < $domNodeList->length; $i++)
{
$node = $domNodeList->item($i);
if($node && $node->parentNode)
{
$node->parentNode->removeChild($node);
}
}
}
}
}
return $document->saveHTML();
}
注意:强>
答案 2 :(得分:1)
根据jebbench的简短回答和PatrikAkerstrand的长篇回答,我创建了一个能完全解决我问题的中等功能:
/**
* remove, from the given xhtml string, all divs with the given class.
*/
function remove_divs_with_class($xhtml, $class) {
$doc = new DOMDocument();
// Hack to force DOMDocument to load the HTML using UTF-8:
$doc->loadHTML('<?xml encoding="UTF-8">'.$xhtml);
$xpath = new DOMXpath($doc);
$elements = $xpath->query("//*[contains(@class,'$class')]");
foreach ($elements as $element)
$element->parentNode->removeChild($element);
return $doc->saveHTML();
}
/* UNIT TEST */
if (basename(__FILE__)==basename($_SERVER['PHP_SELF'])) {
$xhtml = "<div class='near future'>near future</div><div>start</div><div class='future'>future research</div><div class='summary'>summary</div><div class='a future b'>far future</div>";
$xhtml2 = remove_divs_with_class($xhtml, "future");
print "<h2>before</h2>$xhtml<h2>after</h2>$xhtml2";
}
/* OUTPUT:
before
near future
start
future research
summary
far future
after
start
summary
*/
答案 3 :(得分:-1)
永远不要尝试使用正则表达式来解析XML / HTML。而是使用解析库。显然,PHP的一个是http://sourceforge.net/projects/simplehtmldom/files/