YOLOv4 在 Colab Pro 上报告 30 小时的训练时间,只有 340 张训练图像

时间:2021-02-06 04:46:51

标签: deep-learning computer-vision google-colaboratory object-detection yolo

我正在尝试在 Colab Pro 上测试我的模型,但我仅使用 340 个训练图像和 16 个类别进行测试。但是,Colab Pro 告诉我还剩下大约 30 小时的训练时间:

(next mAP calculation at 1200 iterations) 
 Last accuracy mAP@0.5 = 0.37 %, best = 0.37 % 
 1187: 3.270728, 3.027621 avg loss, 0.010000 rate, 1.429193 seconds, 75968 images, 30.824708 hours left
Loaded: 1.136631 seconds - performance bottleneck on CPU or Disk HDD/SSD
...
...
...
 (next mAP calculation at 1300 iterations) 
 Last accuracy mAP@0.5 = 0.33 %, best = 0.37 % 
 1278: 3.231166, 2.967602 avg loss, 0.010000 rate, 2.552415 seconds, 81792 images, 30.512658 hours left
Loaded: 0.712928 seconds - performance bottleneck on CPU or Disk HDD/SSD

我不知道它为什么这样做。我只有一个小数据集。

这是我的 cnfg 参数:

[net]
# Testing
#batch=1
#subdivisions=1
# Training
batch=64
subdivisions=16
width=1024
height=1024
channels=3
momentum=0.9
decay=0.0005
angle=0
saturation = 1.5
exposure = 1.5
hue=.1
 
learning_rate=0.01
burn_in=1000
max_batches = {max_batches}
policy=steps
steps={steps_str}
scales=.1,.1
 
[convolutional]
batch_normalize=1
filters=32
size=3
stride=2
pad=1
activation=leaky
 
[convolutional]
batch_normalize=1
filters=64
size=3
stride=2
pad=1
activation=leaky
 
[convolutional]
batch_normalize=1
filters=64
size=3
stride=1
pad=1
activation=leaky
 
[route]
layers=-1
groups=2
group_id=1
 
[convolutional]
batch_normalize=1
filters=32
size=3
stride=1
pad=1
activation=leaky
 
[convolutional]
batch_normalize=1
filters=32
size=3
stride=1
pad=1
activation=leaky
 
[route]
layers = -1,-2
 
[convolutional]
batch_normalize=1
filters=64
size=1
stride=1
pad=1
activation=leaky
 
[route]
layers = -6,-1
 
[maxpool]
size=2
stride=2
 
[convolutional]
batch_normalize=1
filters=128
size=3
stride=1
pad=1
activation=leaky
 
[route]
layers=-1
groups=2
group_id=1
 
[convolutional]
batch_normalize=1
filters=64
size=3
stride=1
pad=1
activation=leaky
 
[convolutional]
batch_normalize=1
filters=64
size=3
stride=1
pad=1
activation=leaky
 
[route]
layers = -1,-2
 
[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=leaky
 
[route]
layers = -6,-1
 
[maxpool]
size=2
stride=2
 
[convolutional]
batch_normalize=1
filters=256
size=3
stride=1
pad=1
activation=leaky
 
[route]
layers=-1
groups=2
group_id=1
 
[convolutional]
batch_normalize=1
filters=128
size=3
stride=1
pad=1
activation=leaky
 
[convolutional]
batch_normalize=1
filters=128
size=3
stride=1
pad=1
activation=leaky
 
[route]
layers = -1,-2
 
[convolutional]
batch_normalize=1
filters=256
size=1
stride=1
pad=1
activation=leaky
 
[route]
layers = -6,-1
 
[maxpool]
size=2
stride=2
 
[convolutional]
batch_normalize=1
filters=512
size=3
stride=1
pad=1
activation=leaky
 
##################################
 
[convolutional]
batch_normalize=1
filters=256
size=1
stride=1
pad=1
activation=leaky
 
[convolutional]
batch_normalize=1
filters=512
size=3
stride=1
pad=1
activation=leaky
 
[convolutional]
size=1
stride=1
pad=1
filters={num_filters}
activation=linear
 
 
 
[yolo]
mask = 3,4,5
anchors = 10,14,  23,27,  37,58,  81,82,  135,169,  344,319
classes={num_classes}
num=6
jitter=.3
scale_x_y = 1.05
cls_normalizer=1.0
truth_thresh = 1
random=1
nms_kind=greedynms
beta_nms=0.6
ignore_thresh = .9 
iou_normalizer=0.5 
iou_loss=giou
 
[route]
layers = -4
 
[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=leaky
 
[upsample]
stride=2
 
[route]
layers = -1, 23
 
[convolutional]
batch_normalize=1
filters=256
size=3
stride=1
pad=1
activation=leaky
 
[convolutional]
size=1
stride=1
pad=1
filters={num_filters}
activation=linear
 
[yolo]
mask = 1,2,3
anchors = 10,14,  23,27,  37,58,  81,82,  135,169,  344,319
classes={num_classes}
num=6
jitter=.3
scale_x_y = 1.05
cls_normalizer=1.0
ignore_thresh = .9 
iou_normalizer=0.5
iou_loss=giou
truth_thresh = 1
random=1
nms_kind=greedynms
beta_nms=0.6

1 个答案:

答案 0 :(得分:1)

您的训练取决于 max_batches 参数,基本上是最大批次数。

根据此 repo's 建议,max_batches 应为 classes*2000。所以在你的情况下,它是 16*2000=32,000。这就是为什么尽管数据集很小但需要更多时间的原因。