获取远程HTML页面的内容

时间:2014-06-23 14:48:59

标签: php html parsing curl

我正在使用这篇文章How to get content from another page的例子,但我需要得到" SUPERMAN"来自这种格式的网站:

<td headers="superHero">SUPERMAN</td>
<td headers="country">USA</td>

代码:

$url = "http://www.otherweb.com";
$curl = curl_init($url);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, TRUE);
$output = curl_exec($curl);
curl_close($curl);


$DOM = new DOMDocument;
$DOM->loadHTML( $output);

//get all td
//$items = $DOM->getElementsByTagName('td'); 
$items = $DOM->getElementsByID('superHero');

//display all text
 for ($i = 0; $i < $items->length; $i++)
 echo $items->item($i)->nodeValue . "<br/>";

感谢!!!

1 个答案:

答案 0 :(得分:1)

首先,您可以跳过卷曲部分。 DOMDocument使用方法loadHTMLFile()加载甚至远程html文件。只需使用:

$DOM = new DOMDocument();
$DOM->loadHTMLFile($url);
// If the remote page might not being valid against HTML standards,
// you might want to use the "silence operator" : @
@$DOM->loadHTMLFile($url);

如果要按其属性值选择元素,请使用XPath

$selector = new DOMXPath($DOM);
$element = $selector->query('//td[@headers="superHero"]')->item(0);