Question

我正在使用Google CoLab并运行一小段代码，该代码可重复处理170K-190K张图像，并使用{{1从 gdrive / My Drive / folder1 / img.jpg }}函数，然后使用cv2.imread函数调整它们的大小，然后使用cv2.resize将它们写在 gdrive / My Drive / folder2 / img.jpg 这样的路径中。

但是它太慢了。我曾经在Linux Ubuntu LTS18.04 OS上使用Jupyter Notebook在设备上运行，并且迭代速度很快。现在，每次迭代需要将近30秒，这将估计代码将花费65天左右！！

是什么原因导致CoLab上的速度如此之慢，我如何使其更快？

谢谢。

Answer 1

这不是OpenCV的问题，而是Google云端硬盘的问题。即使您在Google Colaboratory中执行操作，从Google云端硬盘访问文件的速度也非常慢。

一种更好的方法是将文件复制到Colab存储中，然后将基于运行时获得更快的性能（CPU / GPU / TPU）。

要复制文件，可以使用shutil库（Shutil documentation）：

import shutil

# To copy files
shutil.copyfile('source_file_location', 'destination_file_location')

# To copy multiple files of folder with some conditions
import os
for file in os.listdir('folder_path'):
    if file.endswith('.jpg'):
        shutil.copyfile('folder_path' + file, 'desination_folder_path' + file)

# To copy folders
shutil.copytree('source_path', 'destination_path')

由于您在Google云端硬盘上的一个文件夹中有很多文件夹，因此有时可能无法访问所有文件。最好将文件批量保存在多个文件夹中（这样可以避免在通过Colab使用Google云端硬盘时出现一些错误）。 See more about this one

Answer 2

这样做的原因是路径和从驱动器读取的开销。

我发现提高速度的最佳方法是将要读取的文件放在CoLab存储本身上。

有两种方法，一种是在CoLab环境中使用Linux命令将文件从驱动器复制到存储的主目录。像!cp <source directory> <destination directory>

例如，在我们的示例中：!cp "gdrive/My Drive/datasets/imdb_crop" "imdb_crop"。

否则，如果您压缩了数据，可以节省解压缩时间，请直接将其解压缩到CoLab存储中。

如果使用tar文件：

from google.colab import drive
import tarfile
drive.mount('/content/gdrive')
fname = 'gdrive/My Drive/IMDB_Dataset.tar'
tar = tarfile.open(fname, "r:")
tar.extractall()
tar.close()

如果使用zip文件：

from google.colab import drive
drive.mount('/content/gdrive')

import zipfile
filename = "drive/My Drive/IMDB_Dataset.zip"
with zipfile.ZipFile(filename, 'r') as zipp:
zipp.extractall()
zipp.close()

为确保其工作正常，您可以在CoLab环境文件中（从侧面表中可以看到）在gdrive之外的主目录中有imdb_crop文件夹（例如）。 / p>

当我直接从imdb_crop文件夹读取文件时，速度要快得多。

使用CoLab在Google云端硬盘上OpenCV读取和写入速度太慢

2 个答案: