使用file_get_contents在php中屏幕scapingin

时间:2012-08-14 14:38:21

标签: php screen screen-scraping

嗨,我对屏幕抓取很新。我正试图从酒店预订网站上搜索评论以显示在我的...

我已经走到这一步但有点卡住了。有人可以帮忙吗?

<?php 
$data = file_get_contents('http://www.laterooms.com/en/hotel-reviews/238902_the-westfield-bb-sandown.aspx');
$regex = '/<div id="summary">
(.+?)</div>/';
preg_match($regex,$data,$match);
var_dump($match); 
echo $match[1];
?>

1 个答案:

答案 0 :(得分:2)

使用DomDocument

<?php
  define('URL', 'http://www.laterooms.com/en/hotel-reviews/238902_the-westfield-bb-sandown.aspx');
  $doc = new DOMDocument();
  $doc->loadHTML(file_get_contents(URL));
  $summary = $doc->getElementById('summary');
  // also have $doc->getElementsByTagName , etc
  var_export($summary);
?>

此外,对于更复杂的查询,您应该考虑查看XPATH(使用类似jQuery的语法)