Question

我正在尝试创建一个cron作业，用于下载存储在数据库队列中的图像文件。

我们使用的所有功能在我们的Web服务器上运行时都能正常工作，但是当我使用以下命令运行cron作业时：php index.php cron image_download我收到Segmentation Fault错误。

调试cron作业表明，当数据传递给get_url_content函数时会发生此错误，该函数在此处调用：

foreach($urls as $url){

    $content = $this->get_url_content($url); 
}

功能就在这里：

function get_url_content($url){
    $agent= 'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.0.3705; .NET CLR 1.1.4322)';
    $ch = curl_init();
    curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
    curl_setopt($ch, CURLOPT_VERBOSE, true);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
    curl_setopt($ch, CURLOPT_BINARYTRANSFER, 1);
    curl_setopt($ch, CURLOPT_USERAGENT, $agent);
    curl_setopt($ch, CURLOPT_URL,$url);
    return curl_exec($ch);
}

有没有更好的方法来下载这些文件？是否有可能不同的方法不会导致相同的分段错误错误？谢谢！

更新：似乎我正在尝试的各种方法不断引发问题。我看到从cron作业返回“Segmentation Fault”或“Killed”错误。有人建议我考虑使用Iron.io这样做，所以我要检查一下。如果有人有关于如何最好地管理此事的其他建议，我将非常感谢您提供更多信息，谢谢。

Answer 1

您可以尝试这种方法，但在此之前，您是否给它提供完整的网址？

function get_content($url){
    $ch = curl_init();
    curl_setopt($ch, CURLOPT_URL, $url);
    curl_setopt($ch, CURLOPT_VERBOSE, 1);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
    curl_setopt($ch, CURLOPT_AUTOREFERER, false);
    curl_setopt($ch, CURLOPT_HTTP_VERSION, CURL_HTTP_VERSION_1_1);
    curl_setopt($ch, CURLOPT_HEADER, 0);
    $result = curl_exec($ch);
    curl_close($ch);
    return($result);
}

function save_content($text,$new_filename){
    $fp = fopen($new_filename, 'w');
    fwrite($fp, $text);
    fclose($fp);
}

// replace this with your array of urls from the database (make sure it is an array)
$urls = ['http://domain.com/path/to/file.zip', 'http://another.com/path/to/image.img'];

foreach($urls as $url){
    $new_filename = basename($url);
    $temp = get_content($url);
    save_content($temp,$new_filename);
}

这将通过其完整的URL获取文件内容并将其保存到磁盘，从而被下载。

如果您不仅限于卷曲，您可以尝试以下方式：

$urls = ['http://domain.com/path/to/file.zip', 'http://another.com/path/to/image.img'];

foreach($urls as $url){
    $new_filename = basename($url);
    // or fopen can be file_get_contents: file_get_contents($url)
    file_put_contents($new_filename, fopen($url, 'r'));
}

甚至

foreach($urls as $url){
    $new_filename = basename($url);
    shell_exec("wget $url -O $new_filename");
}

Answer 2

使用curl选项CURLOPT_FILE将文件从curl直接下载到文件中。对于这种情况，我已经从现有代码中注释掉了另外两个选项。这是你修改过的功能：

function get_url_content($url, $file){
    $fp = fopen ($file, 'w+');                   // open file handle

    $agent= 'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.0.3705; .NET CLR 1.1.4322)';
    $ch = curl_init();
    curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
    curl_setopt($ch, CURLOPT_VERBOSE, true);
    // curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
    curl_setopt($ch, CURLOPT_FILE, $fp);          // output to file
    curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);  // handle redirect
    // curl_setopt($ch, CURLOPT_BINARYTRANSFER, 1);
    curl_setopt($ch, CURLOPT_USERAGENT, $agent);
    curl_setopt($ch, CURLOPT_URL,$url);
    curl_exec($ch);
    fclose($fp);                                  // closing file handle
}

注意，我已经在函数中添加了第二个参数（文件名为$file）。所以只需将您的网址和文件路径（当然是绝对路径）传递给它。

如果您精通Shell scripting，则可以使用curl的命令行选项下载文件。例如，此命令会将图像下载到指定文件中。

curl -s -L "http://img_url/" -o /var/path/image.jpeg

Answer 3

在确定一个简单的解决方案是使用zipArchive之前，我花了很长时间研究其他方式来进行这种多文件下载。这使您可以创建并打开一个zip文件，向其中添加文件，然后将其关闭。然后，您可以创建到存档的Web链接。

自动批量文件下载的最佳方法

3 个答案: