Curl(php脚本)下载不完整的文件

时间:2012-09-26 17:15:27

标签: php curl download

我正在使用php脚本,以便使用curl从外部URL下载xml文件,但我遇到了问题。 Curl有时无法下载完整的文件。当我使用cron通过主机服务器运行脚本时,问题会更频繁发生。

这是剧本:

<?php
header('Content-type:text/html; charset=utf-8');

//initialize downloading xml file tries
$xml_dl_attempts = 0;

//set filename of output xml file
$findex = 0;
while(file_exists("xml".$findex.".xml"))
{
    $findex++;
}
$filename = "xml".$findex.".xml";

//filname for log file
$logfilename = "log.txt";

//Open (append) logfile for write.
$logfileout = fopen($logfilename, 'a');
fwrite($logfileout, "Starting attempts to download the xml file at ".date("H:i:s Y-m-d")."\r\n");

//Attempt to download xml file 8 times
do {
    //Sleep 3 second before retrying download
    if($xml_dl_attempts > 0 ) sleep(3);

    //Increse number of download attempts
    $xml_dl_attempts++;
    //Write to logfile
    fwrite($logfileout, date("H:i:s Y-m-d").": Download attempt number ".$xml_dl_attempts.": ");

    //Download xml file using curl
    $ch = curl_init();
    $url = 'http://www.opap.gr/web/services/rs/betting/availableBetGames/sport/program/4100/0/sport-1.xml?localeId=el_GR';

    curl_setopt($ch, CURLOPT_URL, $url);
    curl_setopt($ch, CURLOPT_HEADER, false);
    curl_setopt($ch, CURLOPT_BINARYTRANSFER, true);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);

    set_time_limit(300); 
    curl_setopt($ch, CURLOPT_TIMEOUT, 300);

    $outfile = fopen($filename, 'w');
    if (!$outfile)
    {
    exit;
    }
    curl_setopt($ch, CURLOPT_FILE, $outfile);

    if(curl_exec($ch)==false)
    {
        fwrite($logfileout, "curl_error: ".curl_error($ch));
    }
    fclose($outfile);
    curl_close($ch);

    //Clear errors
    libxml_use_internal_errors(true);
    libxml_clear_errors();

    //Parse xml file
    $xml = simplexml_load_file($filename);

    //Check for errors
    if($err = libxml_get_last_error())
    {
        fwrite($logfileout, "failed\r\n");
    }
} while($err !== false && $xml_dl_attempts < 8); //repeat if xml was not completely downloaded

//Check if 
if(!$err)
{
    fwrite($logfileout, "successfull\r\n");
}
fwrite($logfileout, "End.\r\n");
fclose($logfileout);
?>

正如您所看到的,我在检查下载的xml文件时检查simplexml解析器是否出错。如果发生错误,那么我重复该过程,限制为8次。我还创建了一个日志文件。

以下是一整天的日志文件:

Starting attempts to download the xml file at 18:35:00 2012-09-25

18:35:00 2012-09-25: Download attempt number : failed

18:35:03 2012-09-25: Download attempt number : failed

18:35:07 2012-09-25: Download attempt number : successfull

End.

Starting attempts to download the xml file at 19:35:00 2012-09-25

19:35:00 2012-09-25: Download attempt number 1: failed

19:35:03 2012-09-25: Download attempt number 2: failed

19:35:06 2012-09-25: Download attempt number 3: failed

19:35:10 2012-09-25: Download attempt number 4: failed

19:35:13 2012-09-25: Download attempt number 5: failed

19:35:16 2012-09-25: Download attempt number 6: failed

19:35:20 2012-09-25: Download attempt number 7: failed

19:35:23 2012-09-25: Download attempt number 8: successfull

End.

Starting attempts to download the xml file at 20:35:00 2012-09-25

20:35:00 2012-09-25: Download attempt number 1: failed

20:35:04 2012-09-25: Download attempt number 2: failed

20:35:08 2012-09-25: Download attempt number 3: successfull

End.

Starting attempts to download the xml file at 21:35:00 2012-09-25

21:35:00 2012-09-25: Download attempt number 1: failed

21:35:04 2012-09-25: Download attempt number 2: failed

21:35:07 2012-09-25: Download attempt number 3: failed

21:35:11 2012-09-25: Download attempt number 4: successfull

End.

Starting attempts to download the xml file at 22:35:00 2012-09-25

22:35:00 2012-09-25: Download attempt number 1: failed

22:35:04 2012-09-25: Download attempt number 2: failed

22:35:07 2012-09-25: Download attempt number 3: successfull

End.

Starting attempts to download the xml file at 23:35:00 2012-09-25

23:35:00 2012-09-25: Download attempt number 1: failed

23:35:03 2012-09-25: Download attempt number 2: failed

23:35:07 2012-09-25: Download attempt number 3: failed

23:35:10 2012-09-25: Download attempt number 4: failed

23:35:14 2012-09-25: Download attempt number 5: failed

23:35:17 2012-09-25: Download attempt number 6: failed

23:35:21 2012-09-25: Download attempt number 7: successfull

End.

Starting attempts to download the xml file at 00:35:00 2012-09-26

00:35:00 2012-09-26: Download attempt number 1: successfull

End.

Starting attempts to download the xml file at 01:35:00 2012-09-26

01:35:00 2012-09-26: Download attempt number 1: failed

01:35:04 2012-09-26: Download attempt number 2: failed

01:35:07 2012-09-26: Download attempt number 3: failed

01:35:11 2012-09-26: Download attempt number 4: failed

01:35:14 2012-09-26: Download attempt number 5: failed

01:35:18 2012-09-26: Download attempt number 6: failed

01:35:21 2012-09-26: Download attempt number 7: failed

01:35:30 2012-09-26: Download attempt number 8: failed

End.

Starting attempts to download the xml file at 02:35:00 2012-09-26

02:35:00 2012-09-26: Download attempt number 1: failed

02:35:03 2012-09-26: Download attempt number 2: failed

02:35:07 2012-09-26: Download attempt number 3: failed

02:35:10 2012-09-26: Download attempt number 4: failed

02:35:13 2012-09-26: Download attempt number 5: failed

02:35:17 2012-09-26: Download attempt number 6: failed

02:35:20 2012-09-26: Download attempt number 7: failed

02:35:24 2012-09-26: Download attempt number 8: failed

End.

Starting attempts to download the xml file at 03:35:00 2012-09-26

03:35:00 2012-09-26: Download attempt number 1: failed

03:35:04 2012-09-26: Download attempt number 2: failed

03:35:07 2012-09-26: Download attempt number 3: failed

03:35:10 2012-09-26: Download attempt number 4: failed

03:35:14 2012-09-26: Download attempt number 5: failed

03:35:17 2012-09-26: Download attempt number 6: failed

03:35:21 2012-09-26: Download attempt number 7: failed

03:35:30 2012-09-26: Download attempt number 8: failed

End.

Starting attempts to download the xml file at 04:35:00 2012-09-26

04:35:00 2012-09-26: Download attempt number 1: failed

04:35:03 2012-09-26: Download attempt number 2: failed

04:35:07 2012-09-26: Download attempt number 3: failed

04:35:10 2012-09-26: Download attempt number 4: failed

04:35:14 2012-09-26: Download attempt number 5: failed

04:35:17 2012-09-26: Download attempt number 6: failed

04:35:21 2012-09-26: Download attempt number 7: failed

04:35:24 2012-09-26: Download attempt number 8: successfull

End.

Starting attempts to download the xml file at 05:35:00 2012-09-26

05:35:00 2012-09-26: Download attempt number 1: failed

05:35:04 2012-09-26: Download attempt number 2: failed

05:35:08 2012-09-26: Download attempt number 3: failed

05:35:11 2012-09-26: Download attempt number 4: failed

05:35:15 2012-09-26: Download attempt number 5: failed

05:35:18 2012-09-26: Download attempt number 6: failed

05:35:22 2012-09-26: Download attempt number 7: failed

05:35:25 2012-09-26: Download attempt number 8: failed

End.

Starting attempts to download the xml file at 06:35:00 2012-09-26

06:35:00 2012-09-26: Download attempt number 1: failed

06:35:03 2012-09-26: Download attempt number 2: failed

06:35:07 2012-09-26: Download attempt number 3: failed

06:35:10 2012-09-26: Download attempt number 4: failed

06:35:14 2012-09-26: Download attempt number 5: failed

06:35:17 2012-09-26: Download attempt number 6: failed

06:35:21 2012-09-26: Download attempt number 7: failed

06:35:24 2012-09-26: Download attempt number 8: failed

End.

Starting attempts to download the xml file at 07:35:00 2012-09-26

07:35:00 2012-09-26: Download attempt number 1: failed

07:35:04 2012-09-26: Download attempt number 2: failed

07:35:07 2012-09-26: Download attempt number 3: failed

07:35:11 2012-09-26: Download attempt number 4: failed

07:35:14 2012-09-26: Download attempt number 5: failed

07:35:18 2012-09-26: Download attempt number 6: failed

07:35:21 2012-09-26: Download attempt number 7: failed

07:35:24 2012-09-26: Download attempt number 8: failed

End.

Starting attempts to download the xml file at 08:35:00 2012-09-26

08:35:00 2012-09-26: Download attempt number 1: failed

08:35:03 2012-09-26: Download attempt number 2: failed

08:35:06 2012-09-26: Download attempt number 3: failed

08:35:10 2012-09-26: Download attempt number 4: failed

08:35:13 2012-09-26: Download attempt number 5: failed

08:35:16 2012-09-26: Download attempt number 6: failed

08:35:20 2012-09-26: Download attempt number 7: failed

08:35:23 2012-09-26: Download attempt number 8: failed

End.

Starting attempts to download the xml file at 09:35:00 2012-09-26

09:35:00 2012-09-26: Download attempt number 1: failed

09:35:04 2012-09-26: Download attempt number 2: failed

09:35:07 2012-09-26: Download attempt number 3: successfull

End.

Starting attempts to download the xml file at 10:35:00 2012-09-26

10:35:00 2012-09-26: Download attempt number 1: failed

10:35:03 2012-09-26: Download attempt number 2: failed

10:35:06 2012-09-26: Download attempt number 3: failed

10:35:10 2012-09-26: Download attempt number 4: failed

10:35:13 2012-09-26: Download attempt number 5: failed

10:35:17 2012-09-26: Download attempt number 6: failed

10:35:20 2012-09-26: Download attempt number 7: successfull

End.

Starting attempts to download the xml file at 11:35:00 2012-09-26

11:35:00 2012-09-26: Download attempt number 1: failed

11:35:03 2012-09-26: Download attempt number 2: failed

11:35:07 2012-09-26: Download attempt number 3: successfull

End.

Starting attempts to download the xml file at 12:35:00 2012-09-26

12:35:00 2012-09-26: Download attempt number 1: failed

12:35:04 2012-09-26: Download attempt number 2: failed

12:35:07 2012-09-26: Download attempt number 3: failed

12:35:11 2012-09-26: Download attempt number 4: failed

12:35:14 2012-09-26: Download attempt number 5: failed

12:35:17 2012-09-26: Download attempt number 6: failed

12:35:21 2012-09-26: Download attempt number 7: successfull

End.

Starting attempts to download the xml file at 13:35:00 2012-09-26

13:35:00 2012-09-26: Download attempt number 1: failed

13:35:03 2012-09-26: Download attempt number 2: successfull

End.

Starting attempts to download the xml file at 14:35:00 2012-09-26

14:35:00 2012-09-26: Download attempt number 1: failed

14:35:03 2012-09-26: Download attempt number 2: failed

14:35:07 2012-09-26: Download attempt number 3: failed

14:35:10 2012-09-26: Download attempt number 4: successfull

End.

Starting attempts to download the xml file at 15:35:00 2012-09-26

15:35:00 2012-09-26: Download attempt number 1: failed

15:35:03 2012-09-26: Download attempt number 2: failed

15:35:07 2012-09-26: Download attempt number 3: failed

15:35:10 2012-09-26: Download attempt number 4: failed

15:35:13 2012-09-26: Download attempt number 5: failed

15:35:17 2012-09-26: Download attempt number 6: failed

15:35:20 2012-09-26: Download attempt number 7: failed

15:35:24 2012-09-26: Download attempt number 8: failed

End.

Starting attempts to download the xml file at 16:35:00 2012-09-26

16:35:00 2012-09-26: Download attempt number 1: failed

16:35:03 2012-09-26: Download attempt number 2: failed

16:35:07 2012-09-26: Download attempt number 3: successfull

End.

问题是,有时它会在一些尝试后设法获取完整的文件,有时则完全失败。另一件需要注意的事情是,当xml不完整时,curl_exec不会返回错误。

不幸的是,拥有xml的服务器不支持范围,因此我不能在文件不完整时恢复该文件。我可以增加尝试的限制,比方说50,但事实是在失败的尝试中脚本仍然下载一些数据,所以对于1MB xml文件,如果它失败30次,每次下载500KB,它会下载成功尝试的16 MB数据。我想每小时运行一次这个脚本,所以我相信这会损害我服务器的带宽。

为什么curl无法下载完整的文件。是否有一些选项使我的行为像浏览器一样,最终总是得到文件?

感谢。

1 个答案:

答案 0 :(得分:1)

问题在于您的来源:服务器。

我尝试在scraperwiki上运行你的刮刀,这就是它显示的内容:

1st screenshot

另外,当我亲自尝试加载xml时出现同样的问题,它第三次为我工作。

您可以看到服务器正在关闭以下图片的前两个请求中的连接,而不是第三个(成功的)请求。

2nd screenshot

所以,问题出在服务器上,如果不是你的话,你就无能为力了。 (除了当然把它带给他们服务器管理员通知!)

注意:我相信scraperwiki有很好的互联网连接,因为很多人都依赖它。所以,你可以安全地将其归咎于server fault #jboss