当html包含javascript时解析问题

时间:2013-11-16 09:24:03

标签: php html parsing

在搜索了很多并且使用了大量的api后,我使用php来获取div的内容。

如果我使用任何替代解析apis,它也不会返回该div的正确内容。

以下是您的帮助代码

<?php
$curl = curl_init();
$headers[] = "Accept: */*";
$headers[] = "Cache-Control: max-age=0";
$headers[] = "Connection: keep-alive";
$headers[] = "Keep-Alive: 300";
$headers[] = "Accept-Charset: utf-8;ISO-8859-1;iso-8859-2;q=0.7,*;q=0.7";
$headers[] = "Accept-Language: en-us,en;q=0.5";
$headers[] = "Pragma: "; // browsers keep this blank.
@curl_setopt($curl, CURLOPT_HTTPHEADER, $headers);
@curl_setopt($curl, CURLOPT_VERBOSE, false);
@curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
@curl_setopt($curl, CURLOPT_HTTP_VERSION, CURL_HTTP_VERSION_1_1);
@curl_setopt($curl, CURLOPT_ENCODING, 'gzip,deflate');
@curl_setopt($curl, CURLOPT_AUTOREFERER, true);
@curl_setopt($curl, CURLOPT_FOLLOWLOCATION, true);
@curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
@curl_setopt($curl, CURLOPT_HEADER, false);
@curl_setopt($curl, CURLOPT_TIMEOUT, 1000);
@curl_setopt($curl, CURLOPT_URL, "http://www.hindishows.com/cid-sony/case-of-a-lawyers-mysterious-death-episode-1/1027.html");
$page = curl_exec($curl);

//$dom = HTML5_Parser::parse($page); 
//var_dump($dom->saveXml()); 

$dom = new DOMDocument();
$dom->resolveExternals = true;
@$dom->load($page);


        $finder = new DomXPath($dom);
        $elements = $finder->query("//div[@id='yt-video-box']");
        echo "null:".is_null($elements);
if (!is_null($elements)) {
    foreach ($elements as $element) {
        print(var_dump($element->saveXML()));
    }
}
?>

输出是 空值: 完成

它应输出该div的适当含量。

也在http://www.find4answers.com/483/parsing-issue-when-html-contains-javascript

问了同样的问题 你能帮助我吗? 感谢。

1 个答案:

答案 0 :(得分:0)

我会去SimpleHTML way

这样的事情:

include_once('libs/simple_html_dom.php');
dm = new DOMDocument();
$dm = file_get_html($item->link);
foreach ($dm->find('div') as $element) {
$tmpInnerText = $element->innerText();
echo $tmpInnerText.PHP_EOL;
            }