我收到此错误:
RuntimeError: CUDA out of memory
GPU 0; 1.95 GiB 总容量; 1.23 GiB 已经分配了 PyTorch 总共预留的 1.27 GiB
但这并不是内存不足,(在我看来)PyTorch 分配了错误的内存大小。我确实将批量大小更改为 1,杀死所有使用内存的应用程序,然后重新启动,但没有任何工作。
这就是我运行它的方式,请告诉我需要什么信息来修复它,或者我应该在哪里检查?谢谢。
python train.py --img 416 --batch 16 --epochs 1 \\
--data '../data.yaml' --cfg ./models/yolov4-csp.yaml \\
--weights '' --name yolov4-csp-results --cache
Using CUDA device0 _CudaDeviceProperties(name='Quadro P620', total_memory=2000MB)
Namespace(adam=False, batch_size=16, bucket='', cache_images=True, cfg='./models/yolov4-csp.yaml', data='../data.yaml', device='', epochs=1, evolve=False, global_rank=-1, hyp='data/hyp.scratch.yaml', img_size=[416, 416], local_rank=-1, logdir='runs/', multi_scale=False, name='yolov4-csp-results', noautoanchor=False, nosave=False, notest=False, rect=False, resume=False, single_cls=False, sync_bn=False, total_batch_size=16, weights='', world_size=1)
Start Tensorboard with "tensorboard --logdir runs/", view at http://localhost:6006/
Hyperparameters {'lr0': 0.01, 'momentum': 0.937, 'weight_decay': 0.0005, 'giou': 0.05, 'cls': 0.5, 'cls_pw': 1.0, 'obj': 1.0, 'obj_pw': 1.0, 'iou_t': 0.2, 'anchor_t': 4.0, 'fl_gamma': 0.0, 'hsv_h': 0.015, 'hsv_s': 0.7, 'hsv_v': 0.4, 'degrees': 0.0, 'translate': 0.5, 'scale': 0.5, 'shear': 0.0, 'perspective': 0.0, 'flipud': 0.0, 'fliplr': 0.5, 'mixup': 0.0}
Overriding ./models/yolov4-csp.yaml nc=80 with nc=1
from n params module arguments
0 -1 1 928 models.common.Conv [3, 32, 3, 1]
1 -1 1 18560 models.common.Conv [32, 64, 3, 2]
2 -1 1 20672 models.common.Bottleneck [64, 64]
3 -1 1 73984 models.common.Conv [64, 128, 3, 2]
4 -1 1 119936 models.common.BottleneckCSP [128, 128, 2]
5 -1 1 295424 models.common.Conv [128, 256, 3, 2]
6 -1 1 1463552 models.common.BottleneckCSP [256, 256, 8]
7 -1 1 1180672 models.common.Conv [256, 512, 3, 2]
8 -1 1 5843456 models.common.BottleneckCSP [512, 512, 8]
9 -1 1 4720640 models.common.Conv [512, 1024, 3, 2]
10 -1 1 12858368 models.common.BottleneckCSP [1024, 1024, 4]
11 -1 1 7610368 models.common.SPPCSP [1024, 512, 1]
12 -1 1 131584 models.common.Conv [512, 256, 1, 1]
13 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
14 8 1 131584 models.common.Conv [512, 256, 1, 1]
15 [-1, -2] 1 0 models.common.Concat [1]
16 -1 1 1642496 models.common.BottleneckCSP2 [512, 256, 2]
17 -1 1 33024 models.common.Conv [256, 128, 1, 1]
18 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
19 6 1 33024 models.common.Conv [256, 128, 1, 1]
20 [-1, -2] 1 0 models.common.Concat [1]
21 -1 1 411648 models.common.BottleneckCSP2 [256, 128, 2]
22 -1 1 295424 models.common.Conv [128, 256, 3, 1]
23 -2 1 295424 models.common.Conv [128, 256, 3, 2]
24 [-1, 16] 1 0 models.common.Concat [1]
25 -1 1 1642496 models.common.BottleneckCSP2 [512, 256, 2]
26 -1 1 1180672 models.common.Conv [256, 512, 3, 1]
27 -2 1 1180672 models.common.Conv [256, 512, 3, 2]
28 [-1, 11] 1 0 models.common.Concat [1]
29 -1 1 6561792 models.common.BottleneckCSP2 [1024, 512, 2]
30 -1 1 4720640 models.common.Conv [512, 1024, 3, 1]
31 [22, 26, 30] 1 32310 models.yolo.Detect [1, [[12, 16, 19, 36, 40, 28], [36, 75, 76, 55, 72, 146], [142, 110, 192, 243, 459, 401]], [256, 512, 1024]]
Model Summary: 334 layers, 5.24994e+07 parameters, 5.24994e+07 gradients
Optimizer groups: 111 .bias, 115 conv.weight, 108 other
Scanning labels ../train/labels.cache (78 found, 0 missing, 0 empty, 0 duplicate, for 78 images): 100%|█| 78/78 [00:00<0
Caching images (0.0GB): 3%|█▌ | 2/78 [00:00<00:03, 19.31it/Caching images (0.0GB): 54%|███████████████████████████████▏ |Caching images (0.0GB): 100%|█████████████████████████████████████████████ █████████████| 78/78 [00:00<00:00, 305.27it/s]
Scanning labels ../valid/labels.cache (15 found, 0 missing, 0 empty, 0 duplicate, for 15 images): 100%|█| 15/15 [00:00<0
Caching images (0.0GB): 100%|█████████████████████████████████████████████]█████████████| 15/15 [00:00<00:00, 333.01it/s]
Analyzing anchors... anchors/target = 4.64, Best Possible Recall (BPR) = 1.0000
Image sizes 416 train, 416 test
Using 8 dataloader workers
Starting training for 1 epochs...
Epoch gpu_mem GIoU obj cls total targets img_size
0%| | 0/5 [00:04<?, ?it/s]
Traceback (most recent call last):
File "train.py", line 443, in <module>
train(hyp, opt, device, tb_writer)
File "train.py", line 256, in train
pred = model(imgs)
File "/home/ctdi/anaconda3/envs/scaled-yolov4.03/lib/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/ctdi/content/ScaledYOLOv4/models/yolo.py", line 109, in forward
return self.forward_once(x, profile) # single-scale inference, train
File "/home/ctdi/content/ScaledYOLOv4/models/yolo.py", line 129, in forward_once
x = m(x) # run
File "/home/ctdi/anaconda3/envs/scaled-yolov4.03/lib/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/ctdi/content/ScaledYOLOv4/models/common.py", line 47, in forward
return x + self.cv2(self.cv1(x)) if self.add else self.cv2(self.cv1(x))
File "/home/ctdi/anaconda3/envs/scaled-yolov4.03/lib/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/ctdi/content/ScaledYOLOv4/models/common.py", line 31, in forward
return self.act(self.bn(self.conv(x)))
File "/home/ctdi/anaconda3/envs/scaled-yolov4.03/lib/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/ctdi/anaconda3/envs/scaled-yolov4.03/lib/python3.6/site-packages/torch/nn/modules/batchnorm.py", line 136, in forward
self.weight, self.bias, bn_training, exponential_average_factor, self.eps)
File "/home/ctdi/anaconda3/envs/scaled-yolov4.03/lib/python3.6/site-packages/torch/nn/functional.py", line 2059, in batch_norm
training, momentum, eps, torch.backends.cudnn.enabled
RuntimeError: CUDA out of memory. Tried to allocate 44.00 MiB (GPU 0; 1.95 GiB total capacity; 1.23 GiB already allocated; 26.94 MiB free; 1.27 GiB reserved in total by PyTorch)
答案 0 :(得分:0)
我终于找到了。问题是,我使用的是新的 CUDA 11.2。那很糟。我删除它。并安装 CUDA 10.2。这解决了问题。