如何使用嵌套循环使用curl通过代理获取html?

时间:2011-03-17 18:11:10

标签: php mysql loops curl nested-loops

我正在尝试使用代理获取页面的源代码。它一直工作到我循环遍历网址并抓取源代码。但是,一旦我尝试循环代理,它就会变慢并超时。我没有收到错误消息,它只是继续工作。这是代理问题还是代码问题?我是PHP的新手,所以非常感谢任何帮助。

您可以在pelican-cement.com/bbb.html上看到问题。这个项目是试图从某些页面中抓取数据,但我们大约只有一半。这是代码:

  <html>
<body>

<?
$urls=explode("\n", $_POST['url']);
$proxies=explode("\n", $_POST['proxy']);

for ( $counter = 0; $counter <= 6; $counter++) {
for ( $count = 0; $count <= 6; $counter++) {

 $ch = curl_init();
 curl_setopt($ch, CURLOPT_URL,$urls[$counter]);
 curl_setopt($ch, CURLOPT_HTTPPROXYTUNNEL, 0);
 curl_setopt($ch, CURLOPT_PROXY,$proxies[$count]);
 curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
 curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
 curl_setopt($ch, CURLOPT_CUSTOMREQUEST,'GET');
 curl_setopt ($ch, CURLOPT_HEADER, 1); 
curl_exec ($ch); 
$curl_scraped_page = curl_exec($ch); 

$FileName = time();
$FileHandle = fopen($FileName, 'w') or die("can't open file");
fwrite($FileHandle, $curl_scraped_page);

$hostname="***";
$username="****";
$password="****";
$dbname="****";
$usertable="****";

$con=mysql_connect($hostname,$username, $password) or die ("<html><script language='JavaScript'>alert('Unable to connect to database! Please try again later.'),history.go(-1)</script></html>");
mysql_select_db($dbname ,$con);

$sql="INSERT INTO **** (time, ad1)
VALUES
('$FileName','$domains')";


if (!mysql_query($sql,$con))
  {
  die('Error: ' . mysql_error());
  }
echo "1 record added";

mysql_close($con);

fclose($FileHandle);

curl_close($ch);

echo $FileName; 

echo "<br/>";

sleep(1);

}
}

?>

</body>
</html>

1 个答案:

答案 0 :(得分:0)

如果您是通过浏览器运行此功能,那么您就会遇到超时:http://www.php.net/manual/en/info.configuration.php#ini.max-execution-time

您可以通过CLI模式运行它以避免达到超时