我正在用Pytorch在COCO数据集上训练Faster RCNN神经网络。
我遵循了下一个教程: https://pytorch.org/tutorials/intermediate/torchvision_tutorial.html
培训结果如下:
Epoch: [6] [ 0/119] eta: 0:01:16 lr: 0.000050 loss: 0.3780 (0.3780) loss_classifier: 0.1290 (0.1290) loss_box_reg: 0.1848 (0.1848) loss_objectness: 0.0239 (0.0239) loss_rpn_box_reg: 0.0403 (0.0403) time: 0.6451 data: 0.1165 max mem: 3105
Epoch: [6] [ 10/119] eta: 0:01:13 lr: 0.000050 loss: 0.4129 (0.4104) loss_classifier: 0.1277 (0.1263) loss_box_reg: 0.2164 (0.2059) loss_objectness: 0.0244 (0.0309) loss_rpn_box_reg: 0.0487 (0.0473) time: 0.6770 data: 0.1253 max mem: 3105
Epoch: [6] [ 20/119] eta: 0:01:07 lr: 0.000050 loss: 0.4165 (0.4302) loss_classifier: 0.1277 (0.1290) loss_box_reg: 0.2180 (0.2136) loss_objectness: 0.0353 (0.0385) loss_rpn_box_reg: 0.0499 (0.0491) time: 0.6843 data: 0.1265 max mem: 3105
Epoch: [6] [ 30/119] eta: 0:01:00 lr: 0.000050 loss: 0.4205 (0.4228) loss_classifier: 0.1271 (0.1277) loss_box_reg: 0.2125 (0.2093) loss_objectness: 0.0334 (0.0374) loss_rpn_box_reg: 0.0499 (0.0484) time: 0.6819 data: 0.1274 max mem: 3105
Epoch: [6] [ 40/119] eta: 0:00:53 lr: 0.000050 loss: 0.4127 (0.4205) loss_classifier: 0.1209 (0.1265) loss_box_reg: 0.2102 (0.2085) loss_objectness: 0.0315 (0.0376) loss_rpn_box_reg: 0.0475 (0.0479) time: 0.6748 data: 0.1282 max mem: 3105
Epoch: [6] [ 50/119] eta: 0:00:46 lr: 0.000050 loss: 0.3973 (0.4123) loss_classifier: 0.1202 (0.1248) loss_box_reg: 0.1947 (0.2039) loss_objectness: 0.0315 (0.0366) loss_rpn_box_reg: 0.0459 (0.0470) time: 0.6730 data: 0.1297 max mem: 3105
Epoch: [6] [ 60/119] eta: 0:00:39 lr: 0.000050 loss: 0.3900 (0.4109) loss_classifier: 0.1206 (0.1248) loss_box_reg: 0.1876 (0.2030) loss_objectness: 0.0345 (0.0365) loss_rpn_box_reg: 0.0431 (0.0467) time: 0.6692 data: 0.1276 max mem: 3105
Epoch: [6] [ 70/119] eta: 0:00:33 lr: 0.000050 loss: 0.3984 (0.4085) loss_classifier: 0.1172 (0.1242) loss_box_reg: 0.2069 (0.2024) loss_objectness: 0.0328 (0.0354) loss_rpn_box_reg: 0.0458 (0.0464) time: 0.6707 data: 0.1252 max mem: 3105
Epoch: [6] [ 80/119] eta: 0:00:26 lr: 0.000050 loss: 0.4153 (0.4113) loss_classifier: 0.1178 (0.1246) loss_box_reg: 0.2123 (0.2036) loss_objectness: 0.0328 (0.0364) loss_rpn_box_reg: 0.0480 (0.0468) time: 0.6744 data: 0.1264 max mem: 3105
Epoch: [6] [ 90/119] eta: 0:00:19 lr: 0.000050 loss: 0.4294 (0.4107) loss_classifier: 0.1178 (0.1238) loss_box_reg: 0.2098 (0.2021) loss_objectness: 0.0418 (0.0381) loss_rpn_box_reg: 0.0495 (0.0466) time: 0.6856 data: 0.1302 max mem: 3105
Epoch: [6] [100/119] eta: 0:00:12 lr: 0.000050 loss: 0.4295 (0.4135) loss_classifier: 0.1171 (0.1235) loss_box_reg: 0.2124 (0.2034) loss_objectness: 0.0460 (0.0397) loss_rpn_box_reg: 0.0498 (0.0469) time: 0.6955 data: 0.1345 max mem: 3105
Epoch: [6] [110/119] eta: 0:00:06 lr: 0.000050 loss: 0.4126 (0.4117) loss_classifier: 0.1229 (0.1233) loss_box_reg: 0.2119 (0.2024) loss_objectness: 0.0430 (0.0394) loss_rpn_box_reg: 0.0481 (0.0466) time: 0.6822 data: 0.1306 max mem: 3105
Epoch: [6] [118/119] eta: 0:00:00 lr: 0.000050 loss: 0.4006 (0.4113) loss_classifier: 0.1171 (0.1227) loss_box_reg: 0.2028 (0.2028) loss_objectness: 0.0366 (0.0391) loss_rpn_box_reg: 0.0481 (0.0466) time: 0.6583 data: 0.1230 max mem: 3105
Epoch: [6] Total time: 0:01:20 (0.6760 s / it)
creating index...
index created!
Test: [ 0/59] eta: 0:00:15 model_time: 0.1188 (0.1188) evaluator_time: 0.0697 (0.0697) time: 0.2561 data: 0.0634 max mem: 3105
Test: [58/59] eta: 0:00:00 model_time: 0.1086 (0.1092) evaluator_time: 0.0439 (0.0607) time: 0.2361 data: 0.0629 max mem: 3105
Test: Total time: 0:00:14 (0.2378 s / it)
Averaged stats: model_time: 0.1086 (0.1092) evaluator_time: 0.0439 (0.0607)
Accumulating evaluation results...
DONE (t=0.02s).
IoU metric: bbox
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.210
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.643
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.079
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.210
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = -1.000
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.011
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.096
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.333
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.333
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = -1.000
Epoch: [7] [ 0/119] eta: 0:01:16 lr: 0.000050 loss: 0.3851 (0.3851) loss_classifier: 0.1334 (0.1334) loss_box_reg: 0.1845 (0.1845) loss_objectness: 0.0287 (0.0287) loss_rpn_box_reg: 0.0385 (0.0385) time: 0.6433 data: 0.1150 max mem: 3105
Epoch: [7] [ 10/119] eta: 0:01:12 lr: 0.000050 loss: 0.3997 (0.4045) loss_classifier: 0.1250 (0.1259) loss_box_reg: 0.1973 (0.2023) loss_objectness: 0.0292 (0.0303) loss_rpn_box_reg: 0.0479 (0.0459) time: 0.6692 data: 0.1252 max mem: 3105
Epoch: [7] [ 20/119] eta: 0:01:07 lr: 0.000050 loss: 0.4224 (0.4219) loss_classifier: 0.1250 (0.1262) loss_box_reg: 0.2143 (0.2101) loss_objectness: 0.0333 (0.0373) loss_rpn_box_reg: 0.0493 (0.0484) time: 0.6809 data: 0.1286 max mem: 3105
Epoch: [7] [ 30/119] eta: 0:01:00 lr: 0.000050 loss: 0.4120 (0.4140) loss_classifier: 0.1191 (0.1221) loss_box_reg: 0.2113 (0.2070) loss_objectness: 0.0357 (0.0374) loss_rpn_box_reg: 0.0506 (0.0475) time: 0.6834 data: 0.1316 max mem: 3105
Epoch: [7] [ 40/119] eta: 0:00:53 lr: 0.000050 loss: 0.4013 (0.4117) loss_classifier: 0.1118 (0.1210) loss_box_reg: 0.2079 (0.2063) loss_objectness: 0.0357 (0.0371) loss_rpn_box_reg: 0.0471 (0.0473) time: 0.6780 data: 0.1304 max mem: 3105
Epoch: [7] [ 50/119] eta: 0:00:46 lr: 0.000050 loss: 0.3911 (0.4035) loss_classifier: 0.1172 (0.1198) loss_box_reg: 0.1912 (0.2017) loss_objectness: 0.0341 (0.0356) loss_rpn_box_reg: 0.0449 (0.0464) time: 0.6768 data: 0.1314 max mem: 3105
Epoch: [7] [ 60/119] eta: 0:00:39 lr: 0.000050 loss: 0.3911 (0.4048) loss_classifier: 0.1186 (0.1213) loss_box_reg: 0.1859 (0.2013) loss_objectness: 0.0334 (0.0360) loss_rpn_box_reg: 0.0412 (0.0462) time: 0.6729 data: 0.1306 max mem: 3105
Epoch: [7] [ 70/119] eta: 0:00:33 lr: 0.000050 loss: 0.4046 (0.4030) loss_classifier: 0.1177 (0.1209) loss_box_reg: 0.2105 (0.2008) loss_objectness: 0.0359 (0.0354) loss_rpn_box_reg: 0.0462 (0.0459) time: 0.6718 data: 0.1282 max mem: 3105
Epoch: [7] [ 80/119] eta: 0:00:26 lr: 0.000050 loss: 0.4125 (0.4067) loss_classifier: 0.1187 (0.1221) loss_box_reg: 0.2105 (0.2022) loss_objectness: 0.0362 (0.0362) loss_rpn_box_reg: 0.0469 (0.0462) time: 0.6725 data: 0.1285 max mem: 3105
Epoch: [7] [ 90/119] eta: 0:00:19 lr: 0.000050 loss: 0.4289 (0.4068) loss_classifier: 0.1288 (0.1223) loss_box_reg: 0.2097 (0.2009) loss_objectness: 0.0434 (0.0375) loss_rpn_box_reg: 0.0479 (0.0461) time: 0.6874 data: 0.1327 max mem: 3105
Epoch: [7] [100/119] eta: 0:00:12 lr: 0.000050 loss: 0.4222 (0.4086) loss_classifier: 0.1223 (0.1221) loss_box_reg: 0.2101 (0.2021) loss_objectness: 0.0405 (0.0381) loss_rpn_box_reg: 0.0483 (0.0463) time: 0.6941 data: 0.1348 max mem: 3105
Epoch: [7] [110/119] eta: 0:00:06 lr: 0.000050 loss: 0.4082 (0.4072) loss_classifier: 0.1196 (0.1220) loss_box_reg: 0.2081 (0.2013) loss_objectness: 0.0350 (0.0379) loss_rpn_box_reg: 0.0475 (0.0461) time: 0.6792 data: 0.1301 max mem: 3105
Epoch: [7] [118/119] eta: 0:00:00 lr: 0.000050 loss: 0.4070 (0.4076) loss_classifier: 0.1196 (0.1223) loss_box_reg: 0.2063 (0.2016) loss_objectness: 0.0313 (0.0375) loss_rpn_box_reg: 0.0475 (0.0462) time: 0.6599 data: 0.1255 max mem: 3105
Epoch: [7] Total time: 0:01:20 (0.6763 s / it)
creating index...
index created!
Test: [ 0/59] eta: 0:00:14 model_time: 0.1194 (0.1194) evaluator_time: 0.0633 (0.0633) time: 0.2511 data: 0.0642 max mem: 3105
Test: [58/59] eta: 0:00:00 model_time: 0.1098 (0.1102) evaluator_time: 0.0481 (0.0590) time: 0.2353 data: 0.0625 max mem: 3105
Test: Total time: 0:00:13 (0.2371 s / it)
Averaged stats: model_time: 0.1098 (0.1102) evaluator_time: 0.0481 (0.0590)
Accumulating evaluation results...
DONE (t=0.02s).
IoU metric: bbox
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.210
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.649
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.079
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.210
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = -1.000
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.011
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.095
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.334
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.334
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = -1.000
我有两个问题:
过度拟合:我不知道我的模型是过度拟合还是拟合不足。如何找出指标?
保存所有时期的最佳模型:如何保存在不同时期训练的最佳模型?根据结果,哪个时期最好?
谢谢!
答案 0 :(得分:1)
您需要跟踪测试数据集(或其他诸如召回率)的损失。提请您注意这部分代码:
for epoch in range(num_epochs):
# train for one epoch, printing every 10 iterations
train_one_epoch(model, optimizer, data_loader, device, epoch, print_freq=10)
# update the learning rate
lr_scheduler.step()
# evaluate on the test dataset
evaluate(model, data_loader_test, device=device)
train_one_epoch
和evaluate
被定义为here。评估函数返回类型为CocoEvaluator
的对象,但是您可以修改代码以使其返回测试损失(您需要以某种方式从CocoEvaluator
对象中提取度量,或者编写自己的度量评估)。>
所以,答案是: