我想解析一些产品的链接,名称和价格。这是我的代码:解析时遇到一些问题,因为我不知道如何获得产品链接和名称。价格还可以,我明白了。而且分页也不起作用
<h2>Telefonai Pigu</h2>
</br>
<?php
include_once('simple_html_dom.php');
$url = "http://pigu.lt/foto_gsm_mp3/mobilieji_telefonai/";
// Start from the main page
$nextLink = $url;
// Loop on each next Link as long as it exsists
while ($nextLink) {
echo "<hr>nextLink: $nextLink<br>";
//Create a DOM object
$html = new simple_html_dom();
// Load HTML from a url
$html->load_file($nextLink);
$phones = $html->find('div#productList span.product');
foreach($phones as $phone) {
// Get the link
$linkas = $phone->href;
// Get the name
$pavadinimas = $phone->find('a[alt]', 0)->plaintext;
// Get the name price and extract the useful part using regex
$kaina = $phone->find('strong[class=nw]', 0)->plaintext;
// This captures the integer part of decimal numbers: In "123,45" will capture "123"... Use @([\d,]+),?@ to capture the decimal part too
echo $pavadinimas, " #----# ", $kaina, " #----# ", $linkas, "<br>";
//$query = "insert into telefonai (pavadinimas,kaina,linkas) VALUES (?,?,?)";
// $this->db->query($query, array($pavadinimas,$kaina, $linkas));
}
// Extract the next link, if not found return NULL
$nextLink = ( ($temp = $html->find('div.pagination a[="rel"]', 0)) ? "https://www.pigu.lt".$temp->href : NULL );
// Clear DOM object
$html->clear();
unset($html);
}
?>
输出:
nextLink: http://pigu.lt/foto_gsm_mp3/mobilieji_telefonai/
A PHP Error was encountered
Severity: Notice
Message: Trying to get property of non-object
Filename: views/pigu_view.php
Line Number: 26
#----# 999,00 Lt #----#
A PHP Error was encountered
Severity: Notice
Message: Trying to get property of non-object
Filename: views/pigu_view.php
Line Number: 26
答案 0 :(得分:1)
请仔细检查您正在处理的源代码,然后,基于此,您可以检索您想要的节点...与其他网站的兼容代码在这里工作是正常的,因为这两个网站没有相同的源代码/结构!
让我们一步一步地继续......
$phones = $html->find('div#productList span.product');
将为您提供所有“手机容器”,或称为“块”...每个块具有以下结构:
<span class="product">
<div class="fakeProductContainer">
<p class="productPhoto">
<span class="">
<span class="flags flag-disc-value" title="Akcija"><strong>500<br><span class="currencySymbol">Lt</span></strong></span>
<span class="flags freeShipping" title="Nemokamas prekių atsiemimas į POST24 paštomatus. Pasiūlymas galioja iki sausio 31 d."></span>
</span>
<a href="/foto_gsm_mp3/mobilieji_telefonai/telefonas_sony_xperia_acro_s?id=4522595" title="Telefonas Sony Xperia acro S" class="photo-medium nobr"><img src="http://lt1.pigugroup.eu//colours/48355/16/4835516/c503caf69ad97d889842a5fd5b3ff372_medium.jpg" title="Telefonas Sony Xperia acro S" alt="Telefonas Sony Xperia acro S"></a>
</p>
<div class="price">
<strong class="nw">999,00 Lt</strong>
<del class="nw">1.499,00 Lt *</del>
</div>
<h3><a href="/foto_gsm_mp3/mobilieji_telefonai/telefonas_sony_xperia_acro_s?id=4522595" title="Telefonas Sony Xperia acro S">Sony Xperia acro S</a></h3>
<p class="descFields">
3G: <em>HSDPA 14.4 Mbps, HSUPA 5.76 Mbps</em><br>
GPS: <em>Taip</em><br>
NFC: <em>Taip</em><br>
Operacinė sistema: <em>Android OS</em><br>
</p>
</div>
</span>
包含产品链接的锚点包含在<p class="productPhoto">
中,并且它是唯一的锚点,因此,要检索它只需使用$linkas = $phone->find('p.productPhoto a', 0)->href;
(然后完成它,因为它只是相对链接)
产品名称位于<h3>
标记中,我们再次使用$pavadinimas = $phone->find('h3 a', 0)->plaintext;
来检索它
价格包含在<div class="price"><strong>
中,我们再次使用$kaina = $phone->find('div[class=price] strong', 0)->plaintext
来检索
然而,并非所有手机都显示其价格,因此,我们必须检查价格是否已正确检索
最后,包含下一个链接的HTML代码如下:
<div id="ListFootPannel">
<div class="pages-list">
<strong>1</strong>
<a href="/foto_gsm_mp3/mobilieji_telefonai?page=2">2</a>
<a href="/foto_gsm_mp3/mobilieji_telefonai?page=3">3</a>
<a href="/foto_gsm_mp3/mobilieji_telefonai?page=4">4</a>
<a href="/foto_gsm_mp3/mobilieji_telefonai?page=5">5</a>
<a href="/foto_gsm_mp3/mobilieji_telefonai?page=6">6</a>
<a rel="next" href="/foto_gsm_mp3/mobilieji_telefonai?page=2">Toliau</a>
</div>
<div class="pages-info">
Prekių
</div>
</div>
因此,我们对<a rel="next">
代码感兴趣,可以使用$html->find('div#ListFootPannel a[rel="next"]', 0)
因此,如果我们将这些修改添加到原始代码中,我们将获得:
$url = "http://pigu.lt/foto_gsm_mp3/mobilieji_telefonai/";
// Start from the main page
$nextLink = $url;
// Loop on each next Link as long as it exsists
while ($nextLink) {
echo "nextLink: $nextLink<br>";
//Create a DOM object
$html = new simple_html_dom();
// Load HTML from a url
$html->load_file($nextLink);
////////////////////////////////////////////////
/// Get phone blocks and extract useful info ///
////////////////////////////////////////////////
$phones = $html->find('div#productList span.product');
foreach($phones as $phone) {
// Get the link
$linkas = "http://pigu.lt" . $phone->find('p.productPhoto a', 0)->href;
// Get the name
$pavadinimas = $phone->find('h3 a', 0)->plaintext;
// If price not found, find() returns FALSE, then return 000
if ( $tempPrice = $phone->find('div[class=price] strong', 0) ) {
// Get the name price and extract the useful part using regex
$kaina = $tempPrice->plaintext;
// This captures the integer part of decimal numbers: In "123,45" will capture "123"... Use @([\d,]+),?@ to capture the decimal part too
preg_match('@(\d+),?@', $kaina, $matches);
$kaina = $matches[1];
}
else
$kaina = "000";
echo $pavadinimas, " #----# ", $kaina, " #----# ", $linkas, "<br>";
}
////////////////////////////////////////////////
////////////////////////////////////////////////
// Extract the next link, if not found return NULL
$nextLink = ( ($temp = $html->find('div#ListFootPannel a[rel="next"]', 0)) ? "http://pigu.lt".$temp->href : NULL );
// Clear DOM object
$html->clear();
unset($html);
echo "<hr>";
}