应用错误收集

我的硬件详细信息：

Motherboard: Asus WS X299 SAGE/10G 
CPU: Intel Core i9-9900X
GPU: Geforce RTX2080 TI - 11GB (4 of them)
Power supply: Masterwatt Maker - 1500Watts

系统详细信息：

Bios Version: 1201.
OS: Ubuntu 18.04
Nvidia driver: 418.56
cuda under Conda: 10.0.130

使用https://github.com/wilicc/gpu-burn测试。所有的gpu都还可以。

每当我使用4个批处理大小为4的GPU在coco数据集上训练maskrcnn_resnet50_fpn（https://github.com/pytorch/vision/tree/master/references/detection）时，系统都会立即重新启动。但是，当我使用3个批处理大小为4的GPU或4个批处理大小为2的GPU时，这就是训练。

可能是什么原因？电源？我很想解决。感谢您的评论。提前致谢 Zulfi

使用4个GPU训练pytorch mask-rcnn时系统重新启动

0 个答案: