Sagemaker 图像分类训练准确度提高但验证准确度仍然等于随机猜测

时间:2021-02-20 16:50:19

标签: amazon-sagemaker image-classification

在训练时,我的验证准确率总是 = 1/classes 而没有改进(随机猜测)。我得到了越来越好的训练准确性。我可以在 Keras 中运行相同的训练集,效果很好。我的图像是 400x100,我尝试过使用和不使用迁移学习。我的设置似乎有问题。

2021-02-20T08:55:51.543-07:00   [02/20/2021 15:55:50 INFO 140404211758208] Epoch[2] Train-accuracy=0.822706

2021-02-20T08:55:51.543-07:00   [02/20/2021 15:55:50 INFO 140404211758208] Epoch[2] Train-top_k_accuracy_3=0.991477

2021-02-20T08:55:51.543-07:00   [02/20/2021 15:55:50 INFO 140404211758208] Epoch[2] Time cost=194.629

2021-02-20T08:56:07.548-07:00   [02/20/2021 15:56:06 INFO 140404211758208] Epoch[2] Validation-accuracy=0.047033

超参数设置:

2021-02-20T08:45:13.281-07:00   [02/20/2021 15:45:12 INFO 140404211758208] Done creating record files...

2021-02-20T08:45:13.281-07:00   [02/20/2021 15:45:12 INFO 140404211758208] use_pretrained_model: 1

2021-02-20T08:45:13.281-07:00   [02/20/2021 15:45:12 INFO 140404211758208] multi_label: 0

2021-02-20T08:45:13.281-07:00   [02/20/2021 15:45:12 INFO 140404211758208] Using pretrained model for initializing weights and transfer learning.

2021-02-20T08:45:13.281-07:00   [02/20/2021 15:45:12 INFO 140404211758208] ---- Parameters ----

2021-02-20T08:45:13.281-07:00   [02/20/2021 15:45:12 INFO 140404211758208] num_layers: 18

2021-02-20T08:45:13.281-07:00   [02/20/2021 15:45:12 INFO 140404211758208] data type: <type 'numpy.float32'>

2021-02-20T08:45:13.281-07:00   [02/20/2021 15:45:12 INFO 140404211758208] epochs: 99

2021-02-20T08:45:13.281-07:00   [02/20/2021 15:45:12 INFO 140404211758208] optimizer: adam

2021-02-20T08:45:13.282-07:00   [02/20/2021 15:45:12 INFO 140404211758208] beta_1: 0.9

2021-02-20T08:45:13.282-07:00   [02/20/2021 15:45:12 INFO 140404211758208] beta_2: 0.999

2021-02-20T08:45:13.282-07:00   [02/20/2021 15:45:12 INFO 140404211758208] eps: 1e-08

2021-02-20T08:45:13.282-07:00   [02/20/2021 15:45:12 INFO 140404211758208] learning_rate: 0.001

2021-02-20T08:45:13.282-07:00   [02/20/2021 15:45:12 INFO 140404211758208] num_training_samples: 114077

2021-02-20T08:45:13.282-07:00   [02/20/2021 15:45:12 INFO 140404211758208] mini_batch_size: 32

2021-02-20T08:45:13.282-07:00   [02/20/2021 15:45:12 INFO 140404211758208] image_shape: 3,400,100

2021-02-20T08:45:13.282-07:00   [02/20/2021 15:45:12 INFO 140404211758208] num_classes: 25

2021-02-20T08:45:13.282-07:00   [02/20/2021 15:45:12 INFO 140404211758208] augmentation_type: None

2021-02-20T08:45:13.282-07:00   [02/20/2021 15:45:12 INFO 140404211758208] kv_store: device

2021-02-20T08:45:13.282-07:00   [02/20/2021 15:45:12 INFO 140404211758208] top_k: 3

2021-02-20T08:45:13.282-07:00   [02/20/2021 15:45:12 INFO 140404211758208] checkpoint_frequency: 3

2021-02-20T08:45:13.282-07:00   [02/20/2021 15:45:12 INFO 140404211758208] Using early stopping for training

2021-02-20T08:45:13.282-07:00   [02/20/2021 15:45:12 INFO 140404211758208] Early stopping minimum epochs: 10

2021-02-20T08:45:13.282-07:00   [02/20/2021 15:45:12 INFO 140404211758208] Early stopping patience: 10

2021-02-20T08:45:13.282-07:00   [02/20/2021 15:45:12 INFO 140404211758208] Early stopping tolerance: 0.01

Training image

Validation image

编辑: 切换到 recordIO 格式确实解决了我的问题。但是,我相信问题中描述的设置应该有效。

0 个答案:

没有答案