这是我第一次使用Curl
并在XPath中选择元素。附件是我目前的代码。
<?php
//$curl = curl_init('https://silvergoldbull.com/');
$curl = curl_init('https://e-katalog.lkpp.go.id/backend/katalog/list_produk/77/?isSubmitted=1&orderBy=hargaAsc&list=5&manufakturId=all&penyediaId=all&page=1');
curl_setopt($curl, CURLOPT_RETURNTRANSFER, TRUE);
curl_setopt($curl, CURLOPT_SSL_VERIFYPEER, false);
$page = curl_exec($curl);
if(curl_errno($curl)) // check for execution errors
{
echo 'Scraper error: ' . curl_error($curl);
exit;
}
echo $page;
curl_close($curl);
$page_doc = new DOMDocument;
libxml_use_internal_errors(true);
$page_doc->loadHTML($page);
libxml_clear_errors(); //remove errors for yucky html
$page_doc_xpath = new DOMXPath($page_doc);
//$result = $page_doc_xpath->evaluate('/html/body/div[2]/div[5]/div/div/div[3]/div[3]/div/table/tbody/tr[1]/td/div/div[3]/div/div[1]/div/ol/li/a');
$result = $page_doc_xpath->evaluate('string(/html/body/div[2]/div[5]/div/div/div[3]/div[3]/div/table/tbody/tr[1]/td/div/div[3]/div/div[1]/div/ol/li/a)');
echo "----";
echo $result;
/* $silverprice = $page_doc_xpath->evaluate('string(/html/body/nav/div[3]/div/div/ul/li[1]/a/span/div/div/strong)');
echo $silverprice; */
/* $buyers = tree.xpath('//div[@title="buyer-name"]/text()') */
/* $regex = '/<div id="case_textlist">(.*?)<\/div>/s';
if ( preg_match($regex, $page, $list) )
echo $list[0];
else
print "Not found"; */
?>
通过使用这些代码,我可以在页面末尾的绿色括号内检索Computer Supplies
。但是,我如何检索其余的红色括号?
更新:
我将$result
修改为以下代码,但仍无效。它只返回Networking
而不是括号中的所有
$result = $page_doc_xpath->evaluate('string(//div[@class="categoryPath"]//a)');
答案 0 :(得分:0)
在我的情况下,我使用Goutte来刮取数据
use Goutte\Client;
$client = new Client();
$crawler = $client->request('GET', $url);
$titles = $crawler->filter('.listing--name')->extract('_text');
by use class或id可以是节点的文本...