scrape html页面有奇怪的结果

时间:2015-11-29 20:07:25

标签: php html scrape

刮痧有效,但奇怪的是结果是[“-3°”]

我尝试了很多不同的东西才能得到-3°

但如果它们不在代码中,那么[和“]如何出现呢?

有人可以给我一些指导如何实现这个目标

我正在使用的代码是

<?php
function scrape($url){
$output = file_get_contents($url); 
return $output;
}

function fetchdata($data, $start, $end){
$data = stristr($data, $start); // Stripping all data from before $start
$data = substr($data, strlen($start));  // Stripping $start
$stop = stripos($data, $end);   // Getting the position of the $end of the    data to scrape
$data = substr($data, 0, $stop);    // Stripping all data from after and including the $end of the data to scrape
return $data;   // Returning the scraped data from the function
}

$page = scrape("https://weather.gc.ca/city/pages/bc-37_metric_e.html");   
$result = fetchdata($page, "<p class=\"text-center mrgn-tp-md mrgn-bttm-sm     lead\"><span class=\"wxo-metric-hide\">", "<abbr title=\"Celsius\">C</abbr>");
echo json_encode(array($result));    
?>

已经感谢你的帮助了!

1 个答案:

答案 0 :(得分:0)

您可以使用DOMDocument来解析HTML文件。

$page = file_get_contents("https://weather.gc.ca/city/pages/bc-37_metric_e.html");
$doc = new DOMDocument();
libxml_use_internal_errors(true);
$doc->loadHTML($page);
libxml_use_internal_errors(false);
$paragraphs = $doc->getElementsByTagName('p');
foreach($paragraphs as $p){
    if($p->getAttribute('class') == 'text-center mrgn-tp-md mrgn-bttm-sm lead') {
        foreach($p->getElementsbyTagName('span') as $attr) {
            if($attr->getAttribute('class') == 'wxo-metric-hide') {
                foreach($attr->getElementsbyTagName('abbr') as $abbr) {
                    if($abbr->getAttribute('title') == 'Celsius') {
                        echo trim($attr->nodeValue);
                    }
                }
            }
        }
    }
}

输出:

-3°C

这假设类和结构是一致的......