我有以下功能代码来清理HTML数据。
function clear_question_data($html) {
$dom = new DOMDocument();
$dom->loadHTML($html);/*Here the HTML data is loading perfectly, it's returning TRUE here*/
die(var_dump($dom));/*This statement gives output as object(DOMNodeList)#11 (0) { }*/
/*$test = $dom->getElementsByTagName('img');
die(var_dump($test));*/
foreach($dom->getElementsByTagName('img') as $image)
{ echo "In a loop"; die;
$image->removeAttribute('alt');
$image->removeAttribute('xmlns');
$image->removeAttribute('title');
}
echo "Out of the loop"; die;
$txt=$dom->saveHTML();
$dom->loadHTML($txt);
foreach($dom->getElementsByTagName('img') as $image)
{
$srcval=$image->getAttribute('src');
$srcval = htmlspecialchars_decode($srcval);
$srcval = str_replace(' ', ' ', $srcval);
if(strpos($srcval,"%5C%22")==0)
{
$srcval = str_replace("%5C%22", "", $srcval);
$srcval = str_replace(".png%5C%22", ".png", $srcval);
}
if(strpos($srcval,"../../..")==0)
{
$srcval = str_replace("../../..", "", $srcval);
}
if(strpos($srcval,"../..")==0)
{
$srcval = str_replace("../..", "", $srcval);
}
if(strpos($srcval,"/ckeditor_3.6.1//plugins")==0)
{
$srcval = str_replace("/ckeditor_3.6.1//", EPN_SITE_URL."ckeditor_3.6.1/", $srcval);
}
$srcval = str_replace(".png/\"", ".png", $srcval);
$srcval = str_replace("�", "-", $srcval);
$image->setAttribute('src',$srcval);
}
$final_data=$dom->saveHTML();
return $final_data;
}
我在成功加载HTML数据后没有得到为什么我得到这个空结果?由于这个结果,我无法进入foreach循环,反过来我的功能没有任何效果。有人可以帮我纠正这个问题吗?我传递的参数(HTML数据)如下:
$html=Glucose when hetaed with CH<sub>3</sub>OH in presence of dry HCl gas gives<img align="middle" alt="�math xmlns=�http://www.w3.org/1998/Math/MathML���mi��#945;�/mi��/math�" class="Wirisformula" src="/ckeditor_3.6.1//plugins/ckeditor_wiris/integration/showimage.php?formula=dedbf6a559a928eeeaee82c4b1bf40d3.png" title="Double click to edit"> and <img align="middle" alt="�math xmlns=�http://www.w3.org/1998/Math/MathML���mi��#946;�/mi��/math�" class="Wirisformula" src="/ckeditor_3.6.1//plugins/ckeditor_wiris/integration/showimage.php?formula=2c5cf4a4494a03be06d6c32308a225ba.png" title="Double click to edit">-methyl glycosides because it contains.<br>
提前致谢。
答案 0 :(得分:0)
你应该在加载到Dom之前进行html解码
P.S。我使用define('EPN_SITE_URL','example.com');
echo clear_question_data(html_entity_decode($html));
$ php ~/test.php
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html><body><p>Glucose when hetaed with CH<sub>3</sub>OH in presence of dry HCl gas gives<img align="middle" class="Wirisformula" src="example.comckeditor_3.6.1/plugins/ckeditor_wiris/integration/showimage.php?formula=dedbf6a559a928eeeaee82c4b1bf40d3.png"> and <img align="middle" class="Wirisformula" src="example.comckeditor_3.6.1/plugins/ckeditor_wiris/integration/showimage.php?formula=2c5cf4a4494a03be06d6c32308a225ba.png">-methyl glycosides because it contains.<br></p></body></html>