我正在colab Pro GPU上运行Convnet。我在运行时中选择了GPU,可以确认GPU可用。我运行的网络与昨天晚上的网络完全相同,但是每个纪元大约需要2个小时,而昨晚每个纪元大约需要3分钟...根本没有任何变化。我觉得合作可能限制了我的GPU使用,但是我不知道如何确定这是否是问题所在。 GPU速度会根据一天中的不同时间波动很大吗?这是我打印的一些诊断信息,有谁知道我该如何深入研究这种缓慢行为的根本原因?
我还尝试将colab中的加速器更改为“无”,并且我的网络与选择“ GPU”时的速度相同,这表明出于某种原因我不再在GPU上进行培训,或者资源受到严重限制。我正在使用Tensorflow 2.1。
gpu_info = !nvidia-smi
gpu_info = '\n'.join(gpu_info)
if gpu_info.find('failed') >= 0:
print('Select the Runtime → "Change runtime type" menu to enable a GPU accelerator, ')
print('and then re-execute this cell.')
else:
print(gpu_info)
Sun Mar 22 11:33:14 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.64.00 Driver Version: 418.67 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla P100-PCIE... Off | 00000000:00:04.0 Off | 0 |
| N/A 40C P0 32W / 250W | 8747MiB / 16280MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
+-----------------------------------------------------------------------------+
def mem_report():
print("CPU RAM Free: " + humanize.naturalsize( psutil.virtual_memory().available ))
GPUs = GPUtil.getGPUs()
for i, gpu in enumerate(GPUs):
print('GPU {:d} ... Mem Free: {:.0f}MB / {:.0f}MB | Utilization {:3.0f}%'.format(i, gpu.memoryFree, gpu.memoryTotal, gpu.memoryUtil*100))
mem_report()
CPU RAM Free: 24.5 GB
GPU 0 ... Mem Free: 7533MB / 16280MB | Utilization 54%
仍然没有运气来加快速度,这是我的代码,也许我忽略了某些东西。。。。。。。。。。。训练图像保存在我的Google驱动器上。 https://www.kaggle.com/c/datasciencebowl
#loading images from kaggle api
#os.environ['KAGGLE_USERNAME'] = ""
#os.environ['KAGGLE_KEY'] = ""
#!kaggle competitions download -c datasciencebowl
#unpacking zip files
#zipfile.ZipFile('./sampleSubmission.csv.zip', 'r').extractall('./')
#zipfile.ZipFile('./test.zip', 'r').extractall('./')
#zipfile.ZipFile('./train.zip', 'r').extractall('./')
data_dir = pathlib.Path('train')
image_count = len(list(data_dir.glob('*/*.jpg')))
CLASS_NAMES = np.array([item.name for item in data_dir.glob('*') if item.name != "LICENSE.txt"])
shrimp_zoea = list(data_dir.glob('shrimp_zoea/*'))
for image_path in shrimp_zoea[:5]:
display.display(Image.open(str(image_path)))
image_generator = tf.keras.preprocessing.image.ImageDataGenerator(rescale=1./255,
validation_split=0.2)
#rotation_range = 40,
#width_shift_range = 0.2,
#height_shift_range = 0.2,
#shear_range = 0.2,
#zoom_range = 0.2,
#horizontal_flip = True,
#fill_mode='nearest')
validation_split = 0.2
BATCH_SIZE = 32
BATCH_SIZE_VALID = 10
IMG_HEIGHT = 224
IMG_WIDTH = 224
STEPS_PER_EPOCH = np.ceil(image_count*(1-(validation_split))/BATCH_SIZE)
VALIDATION_STEPS = np.ceil((image_count*(validation_split)/BATCH_SIZE))
train_data_gen = image_generator.flow_from_directory(directory=str(data_dir),
subset='training',
batch_size=BATCH_SIZE,
class_mode = 'categorical',
shuffle=True,
target_size=(IMG_HEIGHT, IMG_WIDTH),
classes = list(CLASS_NAMES))
validation_data_gen = image_generator.flow_from_directory(directory=str(data_dir),
subset='validation',
batch_size=BATCH_SIZE_VALID,
class_mode = 'categorical',
shuffle=True,
target_size=(IMG_HEIGHT, IMG_WIDTH),
classes = list(CLASS_NAMES))
model_basic = tf.keras.models.Sequential([
tf.keras.layers.Conv2D(16, (3,3), activation='relu', input_shape=(224, 224, 3)),
tf.keras.layers.MaxPooling2D(2, 2),
tf.keras.layers.Conv2D(32, (3,3), activation='relu'),
tf.keras.layers.MaxPooling2D(2,2),
tf.keras.layers.Conv2D(64, (3,3), activation='relu'),
tf.keras.layers.MaxPooling2D(2,2),
tf.keras.layers.Conv2D(128, (3,3), activation='relu'),
tf.keras.layers.MaxPooling2D(2,2),
tf.keras.layers.Conv2D(128, (3,3), activation='relu'),
tf.keras.layers.MaxPooling2D(2,2),
tf.keras.layers.Flatten(),
tf.keras.layers.Dropout(0.2),
tf.keras.layers.Dense(1000, activation='relu'),
tf.keras.layers.Dense(121, activation='softmax')
])
model_basic.summary()
model_basic.compile(optimizer='adam',
loss='categorical_crossentropy',
metrics=['accuracy'])
history = model_basic.fit(
train_data_gen,
epochs=10,
verbose=1,
validation_data=validation_data_gen,
steps_per_epoch=STEPS_PER_EPOCH,
validation_steps=VALIDATION_STEPS,
initial_epoch=0
)
答案 0 :(得分:1)
您的nvidia-smi
输出清楚地表明已连接GPU。您在哪里存储训练数据?如果不在本地磁盘上,建议将其存储在本地磁盘上。远程传输训练数据的速度可能会因Colab后端的位置而异。
答案 1 :(得分:1)
最后,瓶颈似乎是在每批中将图像从Google驱动器加载到colab。将图像加载到磁盘将每个纪元的时间减少到大约30秒...这是我用来加载到磁盘的代码:
!mkdir train_local
!unzip train.zip -d train_local
将我的train.zip文件上传到colab后
答案 2 :(得分:0)
来自Colab's FAQ:
Colab中可用的GPU的类型随时间而变化。这对于Colab能够免费提供对这些资源的访问是必要的。 Colab中可用的GPU通常包括Nvidia K80,T4,P4和P100。无法选择在任何给定时间可以在Colab中连接的GPU类型。对更可靠地访问Colab最快的GPU感兴趣的用户可能对Colab Pro感兴趣。
如果代码未更改,则问题可能与您碰巧连接到的GPU类型的性能特征有关。