Question

这是我第一次使用Curl并在XPath中选择元素。附件是我目前的代码。

    <?php
//$curl = curl_init('https://silvergoldbull.com/');
$curl = curl_init('https://e-katalog.lkpp.go.id/backend/katalog/list_produk/77/?isSubmitted=1&orderBy=hargaAsc&list=5&manufakturId=all&penyediaId=all&page=1');
curl_setopt($curl, CURLOPT_RETURNTRANSFER, TRUE);
curl_setopt($curl, CURLOPT_SSL_VERIFYPEER, false);

$page = curl_exec($curl);

if(curl_errno($curl)) // check for execution errors
{
    echo 'Scraper error: ' . curl_error($curl);
    exit;
}
echo $page; 
curl_close($curl);
$page_doc = new DOMDocument;
libxml_use_internal_errors(true);
$page_doc->loadHTML($page);
libxml_clear_errors(); //remove errors for yucky html

$page_doc_xpath = new DOMXPath($page_doc);
//$result = $page_doc_xpath->evaluate('/html/body/div[2]/div[5]/div/div/div[3]/div[3]/div/table/tbody/tr[1]/td/div/div[3]/div/div[1]/div/ol/li/a');
$result = $page_doc_xpath->evaluate('string(/html/body/div[2]/div[5]/div/div/div[3]/div[3]/div/table/tbody/tr[1]/td/div/div[3]/div/div[1]/div/ol/li/a)');
echo "----";
echo $result;

/* $silverprice = $page_doc_xpath->evaluate('string(/html/body/nav/div[3]/div/div/ul/li[1]/a/span/div/div/strong)');
echo $silverprice; */

/* $buyers = tree.xpath('//div[@title="buyer-name"]/text()') */
/* $regex = '/<div id="case_textlist">(.*?)<\/div>/s';
if ( preg_match($regex, $page, $list) )
    echo $list[0];
else 
    print "Not found";  */
?>

通过使用这些代码，我可以在页面末尾的绿色括号内检索Computer Supplies。但是，我如何检索其余的红色括号？

更新：我将$result修改为以下代码，但仍无效。它只返回Networking而不是括号中的所有

$result = $page_doc_xpath->evaluate('string(//div[@class="categoryPath"]//a)');

Answer 1

在我的情况下，我使用Goutte来刮取数据

use Goutte\Client;
$client = new Client();
$crawler = $client->request('GET', $url);
$titles = $crawler->filter('.listing--name')->extract('_text');

by use class或id可以是节点的文本...

我想打印所有卷曲刮削值。我怎么做到的？

1 个答案: