Question

我试图使用这段代码获取网站的产品图片：

<?php

$url="http://www.akasa.com.tw/update.php?tpl=product/cpu.gallery.tpl&type=Fanless Chassis&type_sub=Fanless Mini ITX&model=A-ITX19-A1B";
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL,$url);
curl_setopt($ch, CURLOPT_USERAGENT, "User-Agent: Mozilla/6.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.7) Gecko/20050414 Firefox/1.0.3");
curl_setopt($ch, CURLOPT_FOLLOWLOCATION,1);
curl_setopt($ch, CURLOPT_RETURNTRANSFER,1);
curl_setopt($ch, CURLOPT_ENCODING, "");
$pagebody=curl_exec($ch);

curl_close ($ch);

$html=str_get_html($pagebody);

print_r($html);

PHPStorm让我读取变量，$ pagebody获得了这个值：

<html><head><title>Request Rejected</title></head><body>The requested URL was rejected. If you think this is an error, please contact the webmaster. <br><br>Your support ID is: 4977197659118049932</body></html>

http://www.akasa.com.tw/update.php?tpl=product/cpu.gallery.tpl&type=Fanless Chassis&type_sub=Fanless Mini ITX&model=A-ITX19-A1B

当我使用浏览器时，我完全看到页面，页面源也给了我所需的所有好信息，但我想自动从中抓取一些图像。知道如何找出我需要用cURL发送的信息，以便网站不会将我视为机器人（我猜这是问题）或如何找到解决这些问题的方法？

Answer 1

基本上，您需要对查询字符串参数进行编码，以便将所有特殊字符正确地表示到url中。您可以使用http_build_query来实现此目的，因此您的网址结构可能如下所示：

$url = implode('?', [
    'http://www.akasa.com.tw/update.php',
    http_build_query([
        'tpl'      => 'product/cpu.gallery.tpl',
        'type'     => 'Fanless Chassis',
        'type_sub' => 'Fanless Mini ITX',
        'model'    => 'A-ITX19-A1B',
    ])
]);

然后是你的其余代码。

cURL抓取让我“请求被拒绝”＃39;请求的网址已被拒绝

1 个答案: