Question

我正在使用DOM类从网页抓取数据。

div分别有各种版块，包括评论，图片，日期，费率等。

以下是废弃特定类数据的代码。但是这里只废弃了头等舱的数据。我如何迭代以便从所有类中获取详细信息？

这是我的代码：

libxml_use_internal_errors(true);
$html= file_get_contents('http://www.yelp.com/biz/franchino-san-francisco?start=80');

$html = escapeshellarg($html) ;
$html = nl2br($html);

$classname = 'rating-qualifier';
$dom = new DOMDocument;
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
$results = $xpath->query("//*[@class='" . $classname . "']");

if ($results->length > 0) {
    echo $review = $results->item(0)->nodeValue;
}


$classname = 'review_comment ieSucks';
$dom = new DOMDocument;
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
$results = $xpath->query("//*[@class='" . $classname . "']");

if ($results->length > 0) {
    echo $review = $results->item(0)->nodeValue;
}

$meta = $dom->documentElement->getElementsByTagName("meta");
echo $meta->item(0)->getAttribute('content');

输出：http://codepad.viper-7.com/j0cTNi

更新

http://codepad.viper-7.com/lHS9jk

我在这里补充道：

$classname = 'review-wrapper';
$dom = new DOMDocument;
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
$results = $xpath->query("//*[@class='" . $classname . "']");

foreach($results as $node)
{
  // scrapping code here
}

但它在每次迭代期间都会丢弃相同的类值。 SEee结果：http://codepad.viper-7.com/lHS9jk

使用DOM迭代所有类块

0 个答案: