用html错误解析html代码

时间:2011-05-03 23:12:06

标签: php html-parsing

我想使用php解析链接:http://dizli.com/dizli/db.html

但是当我编写代码时,

$url = "http://dizli.com/dizli/db.html";
$dom = new DOMDocument();
$html = $dom->loadHTMLFile($url);
$dom->preserveWhiteSpace = false; 
$tables = $dom->getElementsByTagName('table');
$tr = $tables->item(2)->getElementsByTagName('tr');
$rows = $tables->item(0)->getElementsByTagName('td');

foreach($rows as $row)
{
    $movie = $row->getElementsByTagName('b');
    echo $movie;}

我收到了很多错误:

Warning: DOMDocument::loadHTMLFile() [domdocument.loadhtmlfile]: Opening and ending tag mismatch: font and td in http://dizli.com/dizli/db.html, line: 54 in C:\development\app_server\C7\Lib\Tools\News.php on line 93

Warning: DOMDocument::loadHTMLFile() [domdocument.loadhtmlfile]: Opening and ending tag mismatch: font and b in http://dizli.com/dizli/db.html, line: 81 in C:\development\app_server\C7\Lib\Tools\News.php on line 93

Warning: DOMDocument::loadHTMLFile() [domdocument.loadhtmlfile]: Opening and ending tag mismatch: font and b in http://dizli.com/dizli/db.html, line: 106 in C:\development\app_server\C7\Lib\Tools\News.php on line 93

Warning: DOMDocument::loadHTMLFile() [domdocument.loadhtmlfile]: htmlParseEntityRef: no name in http://dizli.com/dizli/db.html, line: 115 in C:\development\app_server\C7\Lib\Tools\News.php on line 93

Warning: DOMDocument::loadHTMLFile() [domdocument.loadhtmlfile]: Opening and ending tag mismatch: td and b in http://dizli.com/dizli/db.html, line: 126 in C:\development\app_server\C7\Lib\Tools\News.php on line 93

Warning: DOMDocument::loadHTMLFile() [domdocument.loadhtmlfile]: Opening and ending tag mismatch: td and font in http://dizli.com/dizli/db.html, line: 126 in C:\development\app_server\C7\Lib\Tools\News.php on line 93

Warning: DOMDocument::loadHTMLFile() [domdocument.loadhtmlfile]: Opening and ending tag mismatch: font and b in http://dizli.com/dizli/db.html, line: 128 in C:\development\app_server\C7\Lib\Tools\News.php on line 93

Warning: DOMDocument::loadHTMLFile() [domdocument.loadhtmlfile]: htmlParseEntityRef: no name in http://dizli.com/dizli/db.html, line: 1575 in C:\development\app_server\C7\Lib\Tools\News.php on line 93

Warning: DOMDocument::loadHTMLFile() [domdocument.loadhtmlfile]: Tag blink invalid in http://dizli.com/dizli/db.html, line: 2190 in C:\development\app_server\C7\Lib\Tools\News.php on line 93

Warning: DOMDocument::loadHTMLFile() [domdocument.loadhtmlfile]: Opening and ending tag mismatch: td and b in http://dizli.com/dizli/db.html, line: 2200 in C:\development\app_server\C7\Lib\Tools\News.php on line 93

Warning: DOMDocument::loadHTMLFile() [domdocument.loadhtmlfile]: Opening and ending tag mismatch: td and font in http://dizli.com/dizli/db.html, line: 2200 in C:\development\app_server\C7\Lib\Tools\News.php on line 93

Warning: DOMDocument::loadHTMLFile() [domdocument.loadhtmlfile]: Opening and ending tag mismatch: body and center in http://dizli.com/dizli/db.html, line: 2225 in C:\development\app_server\C7\Lib\Tools\News.php on line 93

Catchable fatal error: Object of class DOMNodeList could not be converted to string in C:\development\app_server\C7\Lib\Tools\News.php on line 102

有人可以帮我解析这个链接,这样我就可以保存电影的名字和导演的名字。

提前致谢。 Zeeshan

3 个答案:

答案 0 :(得分:5)

要隐藏错误并仍然使用该代码,请在@之前添加广告$dom,例如:

$html = @$dom->loadHTMLFile($url);

答案 1 :(得分:1)

该页面是用非常古老的HTML代码编写的(您可以通过FONT标签,大小写等来表示),因此< br>标签和可能的段落和其他东西也没有关闭。在这种情况下,我建议使用正则表达式来查找它们。

答案 2 :(得分:1)

你的主要问题是最后一行:

echo $movie;

$movieDOMNodeList的一个实例,所以你不能回应它,你需要获得它的元素,例如$movie->item(0)

你也可以var_dump $movie {{1}},看看会给你带来什么。

您可能忽略的警告,取决于您获得的输出。