Question

我需要下载超过100,000张图片。图片有：.png，.jpg，.jpeg，.gif格式。我已批准使用这些照片。他们为我提供了一个包含所有网址的XML文件

网址具有结构

otherdomain / productimages /代码/ imagename.jpg / .PNG / .gif注意

我有一个名为$codes[]的php数组中的所有代码我还有数组$images[]

上所有图像的完整路径

我需要下载所有这些图片并保持相同的结构

MYDOMAIN / productimages /代码/ imagename.jpg / .PNG / .gif注意

到目前为止，由于我在互联网上的研究，我得到的是：

循环遍历所有页面（每个酒店代码）

   $i = 1;
   $r = 100000;

while ($i < $r) {
    $html = get_data('http://otherdomain.com/productimages/'.$codes[$i].'/');
    getImages($html);
    $codes[$i++];
}

    function getImages($html) {
        $matches = array();
        $regex = '~http://otherdomain.com/productimages/(.*?)\.jpg~i';
        preg_match_all($regex, $html, $matches);
        foreach ($matches[1] as $img) {
            saveImg($img);
        }
    }

    function saveImg($name) {
        $url = 'http://otherdomain.com/productimages/'.$name.'.jpg';
        $data = get_data($url);
        file_put_contents('photos/'.$name.'.jpg', $data);
    }

你可以帮助我让这个工作，因为脚本根本不起作用

Answer 1

我可能会建议您更轻松，更快速地完成任务。写一个完整的URL到list.txt 执行wget -x -i list.txt命令，该命令将下载所有图像并根据站点结构将它们放入适当的目录中。

从html下载图像并保留文件夹结构

1 个答案: