我正在尝试获取一个简单的字符串,其中包含我使用search-by-image搜索过的图像的描述。所以我设置了search_by_google.php页面:
<?php
$url = $_REQUEST['url'];
if(empty($_REQUEST['raw'])){
$raw = false;
}
else{
$raw = true;
}
echo fetch_google($url, $raw);
function fetch_google($u, $raw, $terms="sample search",$numpages=1,$user_agent='Mozilla/5.0 (Windows NT 6.1; rv:8.0) Gecko/20100101 Firefox/8.0')
{
$ch = curl_init();
$url = 'http://www.google.com/imghp?hl=en&tab=wi';
curl_setopt ($ch, CURLOPT_URL, $url);
curl_setopt ($ch, CURLOPT_USERAGENT, $user_agent);
curl_setopt ($ch, CURLOPT_HEADER, TRUE);
curl_setopt ($ch, CURLOPT_FOLLOWLOCATION, TRUE);
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, TRUE);
curl_setopt ($ch, CURLOPT_VERBOSE,true);
curl_setopt ($ch, CURLOPT_REFERER, 'http://www.google.com/');
curl_setopt ($ch, CURLOPT_CONNECTTIMEOUT,120);
curl_setopt ($ch, CURLOPT_TIMEOUT,120);
curl_setopt ($ch, CURLOPT_MAXREDIRS,10);
curl_setopt ($ch, CURLOPT_COOKIEFILE,"./cookie.txt");
curl_setopt ($ch, CURLOPT_COOKIEJAR,"./cookie.txt");
curl_setopt ($ch, CURLOPT_VERBOSE,true);
curl_exec($ch);
$searched="";
for($i=0;$i<=$numpages;$i++)
{
$ch = curl_init();
$url="http://www.google.com/searchbyimage?hl=en&image_url=".urlencode($u);
curl_setopt ($ch, CURLOPT_URL, $url);
curl_setopt ($ch, CURLOPT_USERAGENT, $user_agent);
curl_setopt ($ch, CURLOPT_HEADER, TRUE);
curl_setopt ($ch, CURLOPT_FOLLOWLOCATION, TRUE);
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, TRUE);
curl_setopt ($ch, CURLOPT_VERBOSE,true);
curl_setopt ($ch, CURLOPT_REFERER, 'http://www.google.com/imghp?hl=en&tab=wi');
curl_setopt ($ch, CURLOPT_CONNECTTIMEOUT,120);
curl_setopt ($ch, CURLOPT_TIMEOUT,120);
curl_setopt ($ch, CURLOPT_MAXREDIRS,10);
curl_setopt ($ch, CURLOPT_COOKIEFILE,"cookie.txt");
curl_setopt ($ch, CURLOPT_COOKIEJAR,"cookie.txt");
$searched=$searched.curl_exec ($ch);
curl_close ($ch);
}
if($raw){
return $searched;
}
else{
$matches = array();
preg_match('/Best guess for this image:[^<]+<a[^>]+>([^<]+)/', $searched, $matches);
return (count($matches) > 1 ? $matches[1] : false);
}
}
?>
我已经更改了所有卷曲选项但是如果我转到http://www.mysite.altervista.org/search_by_google.php?url=http://www.mysite.org/asdasd.jpg&raw=false
它让我说302 Moved
我已经改变了我的代码
curl_setopt ($ch, CURLOPT_HEADER, TRUE);
在第二个curl_init()中,现在它给了我这条消息:
EDIT 25/03/2014 19:34
我改变了我的代码,就像Sabuj Hassan所说,现在的日志是:
HTTP/1.0 302 Found Cache-Control: public, max-age=21600 Date: Tue, 25 Mar 2014 18:30:07 GMT Age: 16 Location: http://www.google.com/search?tbs=sbi:AMhZZisAo2ZcfY19aFUJcEj26M4zKc9ZuxzfsUPzLuUJk-pd-siPwiplqIcGN5tW1XPU16-XFg1EoK7jc5IU3BKoEHYnwZo7RmuhyF5p9qaZwSgq4FKRkNW44JgzTi4Mr8g6ezNMQ6YzaAEQ-uFbPMNzY40NrE3uB7ePm4BGNowF34PiIjLOiVLkWwQ7sRoBVMoVgzBbAP7rDwHee5LyGF8Dq6QOT1TEhsURduPD6exzITyRl77agELdpTFSi-JXDncI6c4KdcuQYSx2LknnIW6nippmpPf3X5OYGn1CFZw13rlFPitLSY0Ang0COuSXKdpBy6B8Dak9QZNZ9VFB4HBRfnMFiyuBvQtyhAg2LeOnRbjnunGB0P1RlwKBF4hRId7wUdTu4Dfab5DQu9hGauLKcd7GcP4g-jQXx_1gymwDdZnPXLzZp1mkjVMX9GFSppj-IRWp3FVVqChsPEzKXdraevuWJFukjUdF87dU_1kLKO23lC8L3kusy05zcq7ZxyF1dHNfQ0vYJeWumtbRosJNuEcqiSyVW_1-bF104HMJLdCA0gr5VyIZolkcZok4W1sgjFYTWvfj6f0proaGE24HSO4Ov2hmhAy9HQUCr3e-KjgqyP4AOtlmI3VsuLu34jKSo0t4tWbb5PVBi1_1oebuv4oisdVdw22a6CRH2tiw8wg6Ya1VgxsXhyj8U7lrQ8cBHVDKlOI6EimXtnELBHyDNQT1Zpsz1hK10GYvFaRNMFd7Rqmg87CLdycgyRV-sYxNWxIu9agNgHTwuU1W-GgeWWcM9noeMwgqMKSGh9lt_1hda3ZWrcA4Y1MeiG55b4ZYvOjcm9t9iIy6LA2S4AjC2X1qZHvJtSqzgfOz8yTuX5jUHqCl0jI1FdOSmqZV1GqQ0uaJfsuchlsWUULfUJBzFiGkAuOqIzU0bpXLNqLHoYPJUPwr66H6jWPFLsWAS9_1GRNj70s30jfbzcS0NUShUvE2meUhlpx-f5M0nmS0zvf-3OQOUkXlYO2VUZ4x9y8G76hHoTkDxqzhhGrgohyFmkUvAWmSkHTBpbP6gek8cyrmBnXuedSV3r2O71G8CUbdHFxfIO8FWlkGj1cUYu60PoKF6hndjZsOlV-dSNXfOTKeC1jPtf5ycXA0s0xLK7_1K0iWxhfmVq62WgQ4O3Prc4b6bcJm8M1Q9xZhhsElisuUyVTN9-dDMNUZ1h0tUe9oGsZYLh9vjEsMokqBXFM_1igHOfgRn4I17Xt8EBMZI9cEjakByjv-g5Pt9tG69RQm765HLhf8VpafvE5Z3BwDpZs4x5uMkVDURT9qcA&hl=en Server: quimby_frontend Content-Length: 1566 Content-Type: text/html; charset=UTF-8 Expires: Wed, 26 Mar 2014 00:30:07 GMT Alternate-Protocol: 80:quic X-Content-Type-Options: nosniff X-Frame-Options: SAMEORIGIN X-XSS-Protection: 1; mode=block
302 Moved
The document has moved here. HTTP/1.0 302 Found Cache-Control: public, max-age=21600 Date: Tue, 25 Mar 2014 18:30:07 GMT Age: 16 Location: http://www.google.com/search?tbs=sbi:AMhZZisAo2ZcfY19aFUJcEj26M4zKc9ZuxzfsUPzLuUJk-pd-siPwiplqIcGN5tW1XPU16-XFg1EoK7jc5IU3BKoEHYnwZo7RmuhyF5p9qaZwSgq4FKRkNW44JgzTi4Mr8g6ezNMQ6YzaAEQ-uFbPMNzY40NrE3uB7ePm4BGNowF34PiIjLOiVLkWwQ7sRoBVMoVgzBbAP7rDwHee5LyGF8Dq6QOT1TEhsURduPD6exzITyRl77agELdpTFSi-JXDncI6c4KdcuQYSx2LknnIW6nippmpPf3X5OYGn1CFZw13rlFPitLSY0Ang0COuSXKdpBy6B8Dak9QZNZ9VFB4HBRfnMFiyuBvQtyhAg2LeOnRbjnunGB0P1RlwKBF4hRId7wUdTu4Dfab5DQu9hGauLKcd7GcP4g-jQXx_1gymwDdZnPXLzZp1mkjVMX9GFSppj-IRWp3FVVqChsPEzKXdraevuWJFukjUdF87dU_1kLKO23lC8L3kusy05zcq7ZxyF1dHNfQ0vYJeWumtbRosJNuEcqiSyVW_1-bF104HMJLdCA0gr5VyIZolkcZok4W1sgjFYTWvfj6f0proaGE24HSO4Ov2hmhAy9HQUCr3e-KjgqyP4AOtlmI3VsuLu34jKSo0t4tWbb5PVBi1_1oebuv4oisdVdw22a6CRH2tiw8wg6Ya1VgxsXhyj8U7lrQ8cBHVDKlOI6EimXtnELBHyDNQT1Zpsz1hK10GYvFaRNMFd7Rqmg87CLdycgyRV-sYxNWxIu9agNgHTwuU1W-GgeWWcM9noeMwgqMKSGh9lt_1hda3ZWrcA4Y1MeiG55b4ZYvOjcm9t9iIy6LA2S4AjC2X1qZHvJtSqzgfOz8yTuX5jUHqCl0jI1FdOSmqZV1GqQ0uaJfsuchlsWUULfUJBzFiGkAuOqIzU0bpXLNqLHoYPJUPwr66H6jWPFLsWAS9_1GRNj70s30jfbzcS0NUShUvE2meUhlpx-f5M0nmS0zvf-3OQOUkXlYO2VUZ4x9y8G76hHoTkDxqzhhGrgohyFmkUvAWmSkHTBpbP6gek8cyrmBnXuedSV3r2O71G8CUbdHFxfIO8FWlkGj1cUYu60PoKF6hndjZsOlV-dSNXfOTKeC1jPtf5ycXA0s0xLK7_1K0iWxhfmVq62WgQ4O3Prc4b6bcJm8M1Q9xZhhsElisuUyVTN9-dDMNUZ1h0tUe9oGsZYLh9vjEsMokqBXFM_1igHOfgRn4I17Xt8EBMZI9cEjakByjv-g5Pt9tG69RQm765HLhf8VpafvE5Z3BwDpZs4x5uMkVDURT9qcA&hl=en Server: quimby_frontend Content-Length: 1566 Content-Type: text/html; charset=UTF-8 Expires: Wed, 26 Mar 2014 00:30:07 GMT Alternate-Protocol: 80:quic X-Content-Type-Options: nosniff X-Frame-Options: SAMEORIGIN X-XSS-Protection: 1; mode=block
302 Moved
The document has moved here.
答案 0 :(得分:7)
可能会发生以下情况:您的服务器上的卷曲会阻止重定向。因此,我建议您手动处理重定向。像这样:
首先你的卷曲功能。如果您愿意,可以添加其他卷曲选项:
function curl($url, $user_agent, $retry=0){
if($retry > 5){
print "Maximum 5 retries are done, skipping!\n";
return "in loop!";
}
$ch = curl_init();
curl_setopt ($ch, CURLOPT_URL, $url);
curl_setopt ($ch, CURLOPT_USERAGENT, $user_agent);
curl_setopt ($ch, CURLOPT_HEADER, TRUE);
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, TRUE);
curl_setopt ($ch, CURLOPT_REFERER, 'http://www.google.com/');
curl_setopt ($ch, CURLOPT_COOKIEFILE,"./cookie.txt");
curl_setopt ($ch, CURLOPT_COOKIEJAR,"./cookie.txt");
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
$result = curl_exec($ch);
curl_close($ch);
// handling the follow redirect
if(preg_match("|Location: (https?://\S+)|", $result, $m)){
print "Manually doing follow redirect!\n$m[1]\n";
return curl($m[1], $user_agent, $retry + 1);
}
// add another condition here if the location is like Location: /home/products/index.php
return $result;
}
以下是它应该如何调用:
$response = curl("http://www.google.com/", "Mozilla 5.0");
print "$response\n";
我正在解析Location:
标题中的关注链接。可能会发生链接未以http://
开头的情况。这种情况会在那里添加另一个条件。
答案 1 :(得分:4)
问题可能出在CURLOPT_FOLLOWLOCATION,如果在php.ini中启用了safe_mode或open_basedir,则该问题不可用
试试这个答案:http://au.php.net/manual/ro/function.curl-setopt.php#71313
将curl_exec()替换为评论中提供的curl_redir_exec()。
答案 2 :(得分:3)
检查您是否未达到每个IP的最大请求数。您可能会被重定向到验证码页面。 AFAIK反对谷歌规则使用机器人来查询谷歌。