我正在使用php脚本,以便使用curl从外部URL下载xml文件,但我遇到了问题。 Curl有时无法下载完整的文件。当我使用cron通过主机服务器运行脚本时,问题会更频繁发生。
这是剧本:
<?php
header('Content-type:text/html; charset=utf-8');
//initialize downloading xml file tries
$xml_dl_attempts = 0;
//set filename of output xml file
$findex = 0;
while(file_exists("xml".$findex.".xml"))
{
$findex++;
}
$filename = "xml".$findex.".xml";
//filname for log file
$logfilename = "log.txt";
//Open (append) logfile for write.
$logfileout = fopen($logfilename, 'a');
fwrite($logfileout, "Starting attempts to download the xml file at ".date("H:i:s Y-m-d")."\r\n");
//Attempt to download xml file 8 times
do {
//Sleep 3 second before retrying download
if($xml_dl_attempts > 0 ) sleep(3);
//Increse number of download attempts
$xml_dl_attempts++;
//Write to logfile
fwrite($logfileout, date("H:i:s Y-m-d").": Download attempt number ".$xml_dl_attempts.": ");
//Download xml file using curl
$ch = curl_init();
$url = 'http://www.opap.gr/web/services/rs/betting/availableBetGames/sport/program/4100/0/sport-1.xml?localeId=el_GR';
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_HEADER, false);
curl_setopt($ch, CURLOPT_BINARYTRANSFER, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
set_time_limit(300);
curl_setopt($ch, CURLOPT_TIMEOUT, 300);
$outfile = fopen($filename, 'w');
if (!$outfile)
{
exit;
}
curl_setopt($ch, CURLOPT_FILE, $outfile);
if(curl_exec($ch)==false)
{
fwrite($logfileout, "curl_error: ".curl_error($ch));
}
fclose($outfile);
curl_close($ch);
//Clear errors
libxml_use_internal_errors(true);
libxml_clear_errors();
//Parse xml file
$xml = simplexml_load_file($filename);
//Check for errors
if($err = libxml_get_last_error())
{
fwrite($logfileout, "failed\r\n");
}
} while($err !== false && $xml_dl_attempts < 8); //repeat if xml was not completely downloaded
//Check if
if(!$err)
{
fwrite($logfileout, "successfull\r\n");
}
fwrite($logfileout, "End.\r\n");
fclose($logfileout);
?>
正如您所看到的,我在检查下载的xml文件时检查simplexml解析器是否出错。如果发生错误,那么我重复该过程,限制为8次。我还创建了一个日志文件。
以下是一整天的日志文件:
Starting attempts to download the xml file at 18:35:00 2012-09-25
18:35:00 2012-09-25: Download attempt number : failed
18:35:03 2012-09-25: Download attempt number : failed
18:35:07 2012-09-25: Download attempt number : successfull
End.
Starting attempts to download the xml file at 19:35:00 2012-09-25
19:35:00 2012-09-25: Download attempt number 1: failed
19:35:03 2012-09-25: Download attempt number 2: failed
19:35:06 2012-09-25: Download attempt number 3: failed
19:35:10 2012-09-25: Download attempt number 4: failed
19:35:13 2012-09-25: Download attempt number 5: failed
19:35:16 2012-09-25: Download attempt number 6: failed
19:35:20 2012-09-25: Download attempt number 7: failed
19:35:23 2012-09-25: Download attempt number 8: successfull
End.
Starting attempts to download the xml file at 20:35:00 2012-09-25
20:35:00 2012-09-25: Download attempt number 1: failed
20:35:04 2012-09-25: Download attempt number 2: failed
20:35:08 2012-09-25: Download attempt number 3: successfull
End.
Starting attempts to download the xml file at 21:35:00 2012-09-25
21:35:00 2012-09-25: Download attempt number 1: failed
21:35:04 2012-09-25: Download attempt number 2: failed
21:35:07 2012-09-25: Download attempt number 3: failed
21:35:11 2012-09-25: Download attempt number 4: successfull
End.
Starting attempts to download the xml file at 22:35:00 2012-09-25
22:35:00 2012-09-25: Download attempt number 1: failed
22:35:04 2012-09-25: Download attempt number 2: failed
22:35:07 2012-09-25: Download attempt number 3: successfull
End.
Starting attempts to download the xml file at 23:35:00 2012-09-25
23:35:00 2012-09-25: Download attempt number 1: failed
23:35:03 2012-09-25: Download attempt number 2: failed
23:35:07 2012-09-25: Download attempt number 3: failed
23:35:10 2012-09-25: Download attempt number 4: failed
23:35:14 2012-09-25: Download attempt number 5: failed
23:35:17 2012-09-25: Download attempt number 6: failed
23:35:21 2012-09-25: Download attempt number 7: successfull
End.
Starting attempts to download the xml file at 00:35:00 2012-09-26
00:35:00 2012-09-26: Download attempt number 1: successfull
End.
Starting attempts to download the xml file at 01:35:00 2012-09-26
01:35:00 2012-09-26: Download attempt number 1: failed
01:35:04 2012-09-26: Download attempt number 2: failed
01:35:07 2012-09-26: Download attempt number 3: failed
01:35:11 2012-09-26: Download attempt number 4: failed
01:35:14 2012-09-26: Download attempt number 5: failed
01:35:18 2012-09-26: Download attempt number 6: failed
01:35:21 2012-09-26: Download attempt number 7: failed
01:35:30 2012-09-26: Download attempt number 8: failed
End.
Starting attempts to download the xml file at 02:35:00 2012-09-26
02:35:00 2012-09-26: Download attempt number 1: failed
02:35:03 2012-09-26: Download attempt number 2: failed
02:35:07 2012-09-26: Download attempt number 3: failed
02:35:10 2012-09-26: Download attempt number 4: failed
02:35:13 2012-09-26: Download attempt number 5: failed
02:35:17 2012-09-26: Download attempt number 6: failed
02:35:20 2012-09-26: Download attempt number 7: failed
02:35:24 2012-09-26: Download attempt number 8: failed
End.
Starting attempts to download the xml file at 03:35:00 2012-09-26
03:35:00 2012-09-26: Download attempt number 1: failed
03:35:04 2012-09-26: Download attempt number 2: failed
03:35:07 2012-09-26: Download attempt number 3: failed
03:35:10 2012-09-26: Download attempt number 4: failed
03:35:14 2012-09-26: Download attempt number 5: failed
03:35:17 2012-09-26: Download attempt number 6: failed
03:35:21 2012-09-26: Download attempt number 7: failed
03:35:30 2012-09-26: Download attempt number 8: failed
End.
Starting attempts to download the xml file at 04:35:00 2012-09-26
04:35:00 2012-09-26: Download attempt number 1: failed
04:35:03 2012-09-26: Download attempt number 2: failed
04:35:07 2012-09-26: Download attempt number 3: failed
04:35:10 2012-09-26: Download attempt number 4: failed
04:35:14 2012-09-26: Download attempt number 5: failed
04:35:17 2012-09-26: Download attempt number 6: failed
04:35:21 2012-09-26: Download attempt number 7: failed
04:35:24 2012-09-26: Download attempt number 8: successfull
End.
Starting attempts to download the xml file at 05:35:00 2012-09-26
05:35:00 2012-09-26: Download attempt number 1: failed
05:35:04 2012-09-26: Download attempt number 2: failed
05:35:08 2012-09-26: Download attempt number 3: failed
05:35:11 2012-09-26: Download attempt number 4: failed
05:35:15 2012-09-26: Download attempt number 5: failed
05:35:18 2012-09-26: Download attempt number 6: failed
05:35:22 2012-09-26: Download attempt number 7: failed
05:35:25 2012-09-26: Download attempt number 8: failed
End.
Starting attempts to download the xml file at 06:35:00 2012-09-26
06:35:00 2012-09-26: Download attempt number 1: failed
06:35:03 2012-09-26: Download attempt number 2: failed
06:35:07 2012-09-26: Download attempt number 3: failed
06:35:10 2012-09-26: Download attempt number 4: failed
06:35:14 2012-09-26: Download attempt number 5: failed
06:35:17 2012-09-26: Download attempt number 6: failed
06:35:21 2012-09-26: Download attempt number 7: failed
06:35:24 2012-09-26: Download attempt number 8: failed
End.
Starting attempts to download the xml file at 07:35:00 2012-09-26
07:35:00 2012-09-26: Download attempt number 1: failed
07:35:04 2012-09-26: Download attempt number 2: failed
07:35:07 2012-09-26: Download attempt number 3: failed
07:35:11 2012-09-26: Download attempt number 4: failed
07:35:14 2012-09-26: Download attempt number 5: failed
07:35:18 2012-09-26: Download attempt number 6: failed
07:35:21 2012-09-26: Download attempt number 7: failed
07:35:24 2012-09-26: Download attempt number 8: failed
End.
Starting attempts to download the xml file at 08:35:00 2012-09-26
08:35:00 2012-09-26: Download attempt number 1: failed
08:35:03 2012-09-26: Download attempt number 2: failed
08:35:06 2012-09-26: Download attempt number 3: failed
08:35:10 2012-09-26: Download attempt number 4: failed
08:35:13 2012-09-26: Download attempt number 5: failed
08:35:16 2012-09-26: Download attempt number 6: failed
08:35:20 2012-09-26: Download attempt number 7: failed
08:35:23 2012-09-26: Download attempt number 8: failed
End.
Starting attempts to download the xml file at 09:35:00 2012-09-26
09:35:00 2012-09-26: Download attempt number 1: failed
09:35:04 2012-09-26: Download attempt number 2: failed
09:35:07 2012-09-26: Download attempt number 3: successfull
End.
Starting attempts to download the xml file at 10:35:00 2012-09-26
10:35:00 2012-09-26: Download attempt number 1: failed
10:35:03 2012-09-26: Download attempt number 2: failed
10:35:06 2012-09-26: Download attempt number 3: failed
10:35:10 2012-09-26: Download attempt number 4: failed
10:35:13 2012-09-26: Download attempt number 5: failed
10:35:17 2012-09-26: Download attempt number 6: failed
10:35:20 2012-09-26: Download attempt number 7: successfull
End.
Starting attempts to download the xml file at 11:35:00 2012-09-26
11:35:00 2012-09-26: Download attempt number 1: failed
11:35:03 2012-09-26: Download attempt number 2: failed
11:35:07 2012-09-26: Download attempt number 3: successfull
End.
Starting attempts to download the xml file at 12:35:00 2012-09-26
12:35:00 2012-09-26: Download attempt number 1: failed
12:35:04 2012-09-26: Download attempt number 2: failed
12:35:07 2012-09-26: Download attempt number 3: failed
12:35:11 2012-09-26: Download attempt number 4: failed
12:35:14 2012-09-26: Download attempt number 5: failed
12:35:17 2012-09-26: Download attempt number 6: failed
12:35:21 2012-09-26: Download attempt number 7: successfull
End.
Starting attempts to download the xml file at 13:35:00 2012-09-26
13:35:00 2012-09-26: Download attempt number 1: failed
13:35:03 2012-09-26: Download attempt number 2: successfull
End.
Starting attempts to download the xml file at 14:35:00 2012-09-26
14:35:00 2012-09-26: Download attempt number 1: failed
14:35:03 2012-09-26: Download attempt number 2: failed
14:35:07 2012-09-26: Download attempt number 3: failed
14:35:10 2012-09-26: Download attempt number 4: successfull
End.
Starting attempts to download the xml file at 15:35:00 2012-09-26
15:35:00 2012-09-26: Download attempt number 1: failed
15:35:03 2012-09-26: Download attempt number 2: failed
15:35:07 2012-09-26: Download attempt number 3: failed
15:35:10 2012-09-26: Download attempt number 4: failed
15:35:13 2012-09-26: Download attempt number 5: failed
15:35:17 2012-09-26: Download attempt number 6: failed
15:35:20 2012-09-26: Download attempt number 7: failed
15:35:24 2012-09-26: Download attempt number 8: failed
End.
Starting attempts to download the xml file at 16:35:00 2012-09-26
16:35:00 2012-09-26: Download attempt number 1: failed
16:35:03 2012-09-26: Download attempt number 2: failed
16:35:07 2012-09-26: Download attempt number 3: successfull
End.
问题是,有时它会在一些尝试后设法获取完整的文件,有时则完全失败。另一件需要注意的事情是,当xml不完整时,curl_exec不会返回错误。
不幸的是,拥有xml的服务器不支持范围,因此我不能在文件不完整时恢复该文件。我可以增加尝试的限制,比方说50,但事实是在失败的尝试中脚本仍然下载一些数据,所以对于1MB xml文件,如果它失败30次,每次下载500KB,它会下载成功尝试的16 MB数据。我想每小时运行一次这个脚本,所以我相信这会损害我服务器的带宽。
为什么curl无法下载完整的文件。是否有一些选项使我的行为像浏览器一样,最终总是得到文件?
感谢。
答案 0 :(得分:1)
问题在于您的来源:服务器。
我尝试在scraperwiki
上运行你的刮刀,这就是它显示的内容:
另外,当我亲自尝试加载xml时出现同样的问题,它第三次为我工作。
您可以看到服务器正在关闭以下图片的前两个请求中的连接,而不是第三个(成功的)请求。
所以,问题出在服务器上,如果不是你的话,你就无能为力了。 (除了当然把它带给他们服务器管理员通知!)
注意:我相信scraperwiki有很好的互联网连接,因为很多人都依赖它。所以,你可以安全地将其归咎于server fault #jboss