Question

我编写了一个脚本，以从数据库中获取所有图像记录，使用每个图像的临时名称在磁盘上找到图片，将其复制到新文件夹，并从这些文件中创建tar文件。为此，我正在使用PHP的Python class。问题在于图像是TIFF文件，并且文件很大（整个2000张图像的文件夹大小为95gb）。

最初，我创建了一个档案，遍历所有数据库记录以查找特定文件，并使用PharData::addFile()将每个文件分别添加到档案中，但这最终导致一个文件的添加需要15秒钟以上的时间。 / p>

我现在已切换为分批使用PharData::buildFromDirectory，这明显更快，但是创建批处理的时间随着每批的增加而增加。第一批在28秒内完成，第二批在110秒内完成，第三批甚至没有完成。代码；

$imageLocation = '/path/to/imagefolder';
$copyLocation = '/path/to/backupfolder';

$images = [images];

$archive = new PharData($copyLocation . '/images.tar');

// Set timeout to an hour (adding 2000 files to a tar archive takes time apparently), should be enough
set_time_limit(3600);

$time = microtime(true);
$inCurrentBatch = 0;
$perBatch = 100; // Amount of files to be included in one archive file
$archiveNumber = 1;

foreach ($images as $image) {
    $path = $imageLocation . '/' . $image->getTempname();

    // If the file exists, copy to folder with proper file name
    if (file_exists($path)) {
        $copyName = $image->getFilename() . '.tiff';
        $copyPath = $copyLocation . '/' . $copyName;
        copy($path, $copyPath);
        $inCurrentBatch++;

        // If the current batch reached the limit, add all files to the archive and remove .tiff files
        if ($inCurrentBatch === $perBatch) {
            $archive = new PharData($copyLocation . "/images_{$archiveNumber}.tar");
            $archive->buildFromDirectory($copyLocation);
            array_map('unlink', glob("{$copyLocation}/*.tiff"));
            $inCurrentBatch = 0;
            $archiveNumber++;
        }
    }
}

// Archive any leftover files in a last archive
if (glob("{$copyLocation}/*.tiff")) {
    $archive = new PharData($copyLocation . "/images_{$archiveNumber}.tar");
    $archive->buildFromDirectory($copyLocation);
    array_map('unlink', glob("{$copyLocation}/*.tiff"));
}

$taken = microtime(true) - $time;
echo "Done in {$taken} seconds\n";
exit(0);

在批次之间删除复制的图像以节省磁盘空间。

我们对整个脚本花了一段时间感到满意，但我不明白为什么在两次批处理之间创建存档的时间会增加这么多。

批量创建多个tar文件的持续时间随着每个批次的增加而增加

0 个答案: