为什么即使在$ dom对象中成功加载HTML后我也没有获得任何数据?

时间:2013-12-06 08:19:05

标签: php foreach domdocument

我有以下功能代码来清理HTML数据。

function clear_question_data($html) {  
        $dom = new DOMDocument();

        $dom->loadHTML($html);/*Here the HTML data is loading perfectly, it's returning TRUE here*/
        die(var_dump($dom));/*This statement gives output as object(DOMNodeList)#11 (0) { }*/

        /*$test = $dom->getElementsByTagName('img'); 
        die(var_dump($test));*/
        foreach($dom->getElementsByTagName('img') as $image)
        {   echo "In a loop"; die;
            $image->removeAttribute('alt');
            $image->removeAttribute('xmlns');
            $image->removeAttribute('title');
        }
        echo "Out of the loop"; die;

            $txt=$dom->saveHTML();

            $dom->loadHTML($txt);

            foreach($dom->getElementsByTagName('img') as $image)
            {
                $srcval=$image->getAttribute('src');

                $srcval = htmlspecialchars_decode($srcval);

                $srcval = str_replace(' ', ' ', $srcval);  

                if(strpos($srcval,"%5C%22")==0)
                {           
                    $srcval = str_replace("%5C%22", "", $srcval);
                    $srcval = str_replace(".png%5C%22", ".png", $srcval);
                }
                if(strpos($srcval,"../../..")==0)
                {           
                    $srcval = str_replace("../../..", "", $srcval);
                }
                if(strpos($srcval,"../..")==0)
                {           
                    $srcval = str_replace("../..", "", $srcval);
                }
                if(strpos($srcval,"/ckeditor_3.6.1//plugins")==0) 
                {           
                    $srcval = str_replace("/ckeditor_3.6.1//", EPN_SITE_URL."ckeditor_3.6.1/", $srcval);
                }


                  $srcval = str_replace(".png/\"", ".png", $srcval);
                  $srcval = str_replace("�", "-", $srcval);

                $image->setAttribute('src',$srcval);
            }   
            $final_data=$dom->saveHTML();

            return $final_data;
    }

我在成功加载HTML数据后没有得到为什么我得到这个空结果?由于这个结果,我无法进入foreach循环,反过来我的功能没有任何效果。有人可以帮我纠正这个问题吗?我传递的参数(HTML数据)如下:

$html=Glucose when hetaed with CH<sub>3</sub>OH in presence of dry HCl gas gives<img align="middle" alt="�math xmlns=�http://www.w3.org/1998/Math/MathML���mi��#945;�/mi��/math�" class="Wirisformula" src="/ckeditor_3.6.1//plugins/ckeditor_wiris/integration/showimage.php?formula=dedbf6a559a928eeeaee82c4b1bf40d3.png" title="Double click to edit"> and <img align="middle" alt="�math xmlns=�http://www.w3.org/1998/Math/MathML���mi��#946;�/mi��/math�" class="Wirisformula" src="/ckeditor_3.6.1//plugins/ckeditor_wiris/integration/showimage.php?formula=2c5cf4a4494a03be06d6c32308a225ba.png" title="Double click to edit">-methyl glycosides because it contains.<br>

提前致谢。

1 个答案:

答案 0 :(得分:0)

你应该在加载到Dom之前进行html解码

P.S。我使用define('EPN_SITE_URL','example.com');

echo clear_question_data(html_entity_decode($html));

$ php ~/test.php
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html><body><p>Glucose when hetaed with CH<sub>3</sub>OH in presence of dry HCl gas gives<img align="middle" class="Wirisformula" src="example.comckeditor_3.6.1/plugins/ckeditor_wiris/integration/showimage.php?formula=dedbf6a559a928eeeaee82c4b1bf40d3.png"> and <img align="middle" class="Wirisformula" src="example.comckeditor_3.6.1/plugins/ckeditor_wiris/integration/showimage.php?formula=2c5cf4a4494a03be06d6c32308a225ba.png">-methyl glycosides because it contains.<br></p></body></html>