当我尝试解析Google的搜索结果时出现错误
$html = file_get_contents('http://www.google.dk/search?q='.urlencode($query).'&start=0&num=100', false, $context);
$doc = new DOMDocument();
$doc->loadHTML($html);
PHP Warning: DOMDocument::loadHTML(): Input is not proper UTF-8, indicate encoding ! in Entity, line: 1 in /var/www/dynaccount.com/class/Cronjob_check_serp_position.php on line 132
Warning: DOMDocument::loadHTML(): Input is not proper UTF-8, indicate encoding ! in Entity, line: 1 in /var/www/dynaccount.com/class/Cronjob_check_serp_position.php on line 132
PHP Warning: DOMDocument::loadHTML(): htmlParseEntityRef: expecting ';' in Entity, line: 1 in /var/www/dynaccount.com/class/Cronjob_check_serp_position.php on line 132
Warning: DOMDocument::loadHTML(): htmlParseEntityRef: expecting ';' in Entity, line: 1 in /var/www/dynaccount.com/class/Cronjob_check_serp_position.php on line 132
PHP Warning: DOMDocument::loadHTML(): htmlParseEntityRef: expecting ';' in Entity, line: 1 in /var/www/dynaccount.com/class/Cronjob_check_serp_position.php on line 132
答案 0 :(得分:1)
libxml有一些内置的错误处理,这将有助于
$query='php rocks';
$data=file_get_contents('http://www.google.co.uk/search?q='.urlencode( $query ).'&start=0&num=100');
libxml_use_internal_errors( true );
$html = new DOMDocument('1.0','utf-8');
$html->validateOnParse=false;
$html->standalone=true;
$html->preserveWhiteSpace=true;
$html->strictErrorChecking=false;
$html->substituteEntities=false;
$html->recover=true;
$html->formatOutput=true;
$html->loadHTML( $data );
$parse_errs=serialize( libxml_get_last_error() );
libxml_clear_errors();
$xpath=new DOMXPath( $html );
$div=$html->getElementById('ires');
$col=$xpath->query("ol/li/h3/a", $div );
foreach( $col as $node ) echo $node->getAttribute('href').'<br />';
$html=null;
$xpath=null;