我真的想在这里和Google上找到每个问题的答案,但找不到任何答案。
我的php代码只是在代码中间停止运行,并且每次运行代码时都在不同的时间停止运行。我不认为CURL函数有问题,因为代码有时会在CURL函数调用之前或之后停止。我认为这不是代码错误,因为代码在运行时可以正确运行。我想这是共享主机的“超时”问题。
我的代码基本上是通过simple_html_dom库和curl函数来进行“网络报废”的。我在共享的虚拟主机(hostgator)上运行它,我也尝试通过CRON JOB运行它,但是它也无法正常工作。
我已经在代码的开头设置了变量(并且还更改了PHP.INI上的变量),但没有用:
ignore_user_abort(true);
set_time_limit(0);
ini_set('max_execution_time', 0);
ini_set('memory_limit',-1);
完整代码(我缩短了一点,在原始代码中,我放置了一些不同的日期并多次调用了“ scrap”函数):
require('simple_html_dom.php');
//get today's date
$today = date('Y-m-d');
if (date('H') < '9') {
$date_period = "today";
$date_period_date = date('Y-m-d');
$puDay = date('j');
$puMonth = date('n');
$puYear = date('Y');
$doDay = date('j', strtotime(' + 1 days'));
$doMonth = date('n', strtotime(' + 1 days'));
$doYear = date('Y', strtotime(' + 1 days'));
scrap($puDay,$puMonth,$puYear,$doDay,$doMonth,$doYear,$date_period, $today, $date_period_date, $location_id,$location,$city);
unset($date_period,$date_period_date,$puDay,$puMonth,$puYear,$doDay,$doMonth,$doYear);
}
//functions
function scrap($puDay_aux, $puMonth_aux, $puYear_aux, $doDay_aux, $doMonth_aux, $doYear_aux, $period_id_aux, $curDate_aux, $periodDate_aux, $location_id_aux,$location_aux,$city_aux){
$bad_proxy = "";
$check = 1;
do{
$link = "my link";
$best_proxy = get_best_proxy($link, $bad_proxy);
$scraped_page = curl($link, $best_proxy);
$html = new simple_html_dom();
$html->load($scraped_page);
$check_end = strpos($html,'</html>');
if(!empty($html)) {
if ($check_end===FALSE) {
$check = $check + 1;
$bad_proxy = $best_proxy;
} else {
foreach($html->find('table[class=ResultRow]') as $element)
{
$supplier = $element->find('h4',0);
unset($supplier,$supplier_aux,$car,$car_aux,$price,$price_aux,$priceBRL);
}
$html->clear();
unset($link,$html,$best_proxy,$stream,$context);
$check = 5;
}
} else {
$check = $check + 1;
}
} while ($check<5);
}
function get_best_proxy($link, $bad_proxy){
$proxy_array = array(
'177.184.144.130:8080',
'177.6.147.202:8080',
'187.44.1.167:8080',
'170.82.228.42:8080',
'177.72.1.102:8080',
'138.185.101.20:8080',
'187.102.149.178:8080',
'177.32.12.127:8080',
'189.38.3.9:8080',
'138.185.101.21:8080'
);
$i=0;
foreach ($proxy_array as $key){
if ($key != $bad_proxy) {
$proxy_speed = proxy_speed($key, $link);
$proxy_speed_result[$i] = $proxy_speed;
if ($proxy_speed<9999999){break;}
$i++;
}
}
$min = array_keys($proxy_speed_result, min($proxy_speed_result));
$min_aux = $min[0];
$proxy_output = $proxy_array[$min_aux];
return($proxy_output);
}
function proxy_speed($proxy, $link) {
$link = "my link here";
$loadingtime = time();
$theHeader = curl_init($link);
curl_setopt($theHeader, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($theHeader, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($theHeader, CURLOPT_AUTOREFERER, 1);
curl_setopt($theHeader, CURLOPT_MAXREDIRS, 10);
curl_setopt($theHeader, CURLOPT_USERAGENT, "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.1a2pre) Gecko/2008073000 Shredder/3.0a2pre ThunderBrowse/3.2.1.8");
curl_setopt($theHeader, CURLOPT_HTTPPROXYTUNNEL, 1);
curl_setopt($theHeader, CURLOPT_SSL_VERIFYPEER, 0);
curl_setopt($theHeader, CURLOPT_SSL_VERIFYHOST, 0);
curl_setopt($theHeader, CURLOPT_CONNECTTIMEOUT, 10);
curl_setopt($theHeader, CURLOPT_TIMEOUT, 10);
curl_setopt($theHeader, CURLOPT_PROXY, $proxy);
$curlResponse = curl_exec($theHeader);
if ($curlResponse === false)
{
return 9999999;
}
else
{
return (time() - $loadingtime);
}
}
function curl($url, $proxy) {
$options = Array(
CURLOPT_RETURNTRANSFER => TRUE, // Setting cURL's option to return the webpage data
CURLOPT_FOLLOWLOCATION => TRUE, // Setting cURL to follow 'location' HTTP headers
CURLOPT_AUTOREFERER => TRUE, // Automatically set the referer where following 'location' HTTP headers
CURLOPT_CONNECTTIMEOUT => 300, // Setting the amount of time (in seconds) before the request times out
CURLOPT_TIMEOUT => 300, // Setting the maximum amount of time for cURL to execute queries
CURLOPT_MAXREDIRS => 10, // Setting the maximum number of redirections to follow
CURLOPT_USERAGENT => "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.1a2pre) Gecko/2008073000 Shredder/3.0a2pre ThunderBrowse/3.2.1.8", // Setting the useragent
CURLOPT_URL => $url, // Setting cURL's URL option with the $url variable passed into the function
CURLOPT_HTTPPROXYTUNNEL => 1,
CURLOPT_SSL_VERIFYPEER => false,
CURLOPT_SSL_VERIFYHOST => false,
CURLOPT_PROXY => $proxy
);
$ch = curl_init(); // Initialising cURL
$httpCode = curl_getinfo($ch , CURLINFO_HTTP_CODE);
curl_setopt_array($ch, $options); // Setting cURL's options using the previously assigned array data in $options
$data = curl_exec($ch); // Executing the cURL request and assigning the returned data to the $data variable
if ($data === false) $data = curl_error($ch);
return stripslashes($data);
curl_close($ch);
}
有人知道这里发生了什么吗?我的虚拟主机是否超时? 谢谢!