如何从HTML中提取数据

时间:2015-11-09 19:58:45

标签: php curl raspberry-pi domdocument

我正在尝试编写一个PHP脚本,以便从http://www.snowbird.com/mountain-report中提取雪和其他数据,以通过LED阵列显示。我在获取所需数据方面遇到了麻烦。我似乎无法找到一种方法让它发挥作用。我读过关于PHP不是最好的工具吗?我是否能够完成这项工作,还是我必须使用不同的语言?这是我似乎无法工作的代码。

<?php
include_once('simple_html_dom.php');


// create curl resource
$ch = curl_init();
// set url
curl_setopt($ch, CURLOPT_URL, "http://www.snowbird.com/mountain-report/");
//return the transfer as a string
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
// $output contains the output string
$output = curl_exec($ch);
// close curl resource to free up system resources
curl_close($ch);


$output = ($output);

$html = new DOMDocument();
$html = loadhtml( $content);

$ret1 = $html->find('div[id=twelve-hour]');
print_r ($ret1);
$ret2 = $html->find('#twenty-four-hour');
print_r ($ret2);
$ret3 = $html->find('#forty-eight-hour');
print_r ($ret3);
$ret4 = $html->find('#current-depth');
print_r ($ret4);
$ret5 = $html->find('#year-to-date');
print_r ($ret5);
?>

1 个答案:

答案 0 :(得分:0)

这是一个古老的问题,但它很容易为它提供答案。使用XPath query获取正确节点的文本值。 (这应该像将URL直接传递给DOMDocument::loadHTMLFile()一样简单,但服务器是基于用户代理的请求,因此我们必须伪造它。)

<?php

$ctx = stream_context_create(["http"=>[
    "user_agent"=>"Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:53.0) Gecko/20100101 Firefox/53.0"
]]);
$html = file_get_contents("http://www.snowbird.com/mountain-report/", true, $ctx);
libxml_use_internal_errors(true);
$doc = new DOMDocument;
$doc->loadHTML($html, LIBXML_NOWARNING|LIBXML_NOERROR);
$xp = new DomXpath($doc);
$root = $doc->getElementById("snowfall");

$snowfall = [
    "12hour" => $xp->query("div[@id='twelve-hour']/div[@class='total-inches']/text()", $root)->item(0)->textContent,
    "24hour" => $xp->query("div[@id='twenty-four-hour']/div[@class='total-inches']/text()", $root)->item(0)->textContent,
    "48hour" => $xp->query("div[@id='forty-eight-hour']/div[@class='total-inches']/text()", $root)->item(0)->textContent,
    "current" => $xp->query("div[@id='current-depth']/div[@class='total-inches']/text()", $root)->item(0)->textContent,
    "ytd" => $xp->query("div[@id='year-to-date']/div[@class='total-inches']/text()", $root)->item(0)->textContent,
];

print_r($snowfall);