我正在尝试使用CURL打开一个html页面,然后提取验证码图像URL并将图像保存为PNG。我能够做到这两点,但屏幕上显示的图像和保存的图像文件是不同的。我该如何解决这个问题?
//Get page contents first
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL,"https://www.gstsearch.in/track-provisional-id.html");
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, FALSE);
curl_setopt($ch, CURLOPT_COOKIESESSION, TRUE);
curl_setopt($ch, CURLOPT_COOKIEFILE, "cookiefile.txt");
curl_setopt($ch, CURLOPT_COOKIEJAR, "cookiefile.txt");
$pageContent = curl_exec ($ch);
$errNo = curl_errno($ch); //CURL error code
curl_close ($ch);
if($errNo == 0) {
$imgURL = getCaptcha($pageContent); //Get captcha image
saveCaptcha($imgURL); //Save the captcha image as PNG
}
else {
$errorMsg = curl_strerror($errNo);
echo "CURL error ({$errNo}):\n {$errorMsg}";
}
function getCaptcha($html) {
$dom = new DOMDocument();
@$dom->loadHTML($html);
$captchaImg = $dom->getElementById('captchacode');
$imgSrc = $captchaImg->getAttribute('data-src');
//URL of the current captcha image
$imgURL = "https://www.gstsearch.in/{$imgSrc}";
echo "<img src={$imgURL}>";
return $imgURL;
}
function saveCaptcha($url) {
$fp = fopen ("captcha.png", 'w+');
$sc = curl_init();
curl_setopt($sc, CURLOPT_URL, $url);
curl_setopt($sc, CURLOPT_SSL_VERIFYPEER, FALSE);
curl_setopt($sc, CURLOPT_COOKIEFILE, "cookiefile.txt");
curl_setopt($sc, CURLOPT_COOKIEJAR, "cookiefile.txt");
curl_setopt($sc, CURLOPT_FILE, $fp);
curl_setopt($sc, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($sc, CURLOPT_USERAGENT, 'Mozilla/5.0');
curl_exec($sc);
curl_close($sc);
fclose($fp);
}
更新:我根据建议更新了代码,但仍然发生了同样的事情。我错过了什么?
答案 0 :(得分:1)
我同意@jeroen,远程站点认为有两个不同的用户:一个发布信息,另一个是检索CAPTCHA:)
您可以使用以下内容存储(并重复使用)session_id
:
//this is to pass `session_id` between requests
curl_setopt($ch, CURLOPT_COOKIEFILE, $some_path . 'cookie.txt');
//this is to store cookies for future requests, i.e. if you want to retain your session
curl_setopt($ch, CURLOPT_COOKIEJAR, $some_path . 'cookie.txt');
你应该将这些用于两个请求。这种方式网站会认为你是同一个用户,但不是两个不同的(正如它现在所想的那样)