我希望以数组形式或xml格式获取我的html数据,以便可以轻松地将其保存在数据库中。到目前为止,这是我的工作:
$url = "http://www.example.com/";
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 10);
if($html = curl_exec($ch)){
// parse the html into a DOMDocument
$dom = new DOMDocument();
$dom->recover = true;
$dom->strictErrorChecking = false;
@$dom->loadHTML($html);
$hrefs = $dom->getElementsByTagName('div');
curl_close($ch);
}else{
echo "The website could not be reached.";
}
我该怎么做才能以数组形式或xml格式获取html。 html即将发布:
<div>
<ul>
<li>Product Name</li>
<li>Category</li>
<li>Subcategory</li>
<li>Product Price</li>
<li>Product Company</li>
</ul>
</div>
答案 0 :(得分:1)
对于XML输出,请执行以下操作:
function download_page($path){
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL,$path);
curl_setopt($ch, CURLOPT_FAILONERROR,1);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION,1);
curl_setopt($ch, CURLOPT_RETURNTRANSFER,1);
curl_setopt($ch, CURLOPT_TIMEOUT, 15);
$retValue = curl_exec($ch);
curl_close($ch);
return $retValue;
}
$sXML = download_page('http://example.com');
$oXML = new SimpleXMLElement($sXML);
foreach($oXML->entry as $oEntry){
header('Content-type: application/xml')
echo $oEntry->title . "\n";
}