验证损失在 Resnet 中没有改变

时间:2021-06-29 07:30:23

标签: tensorflow machine-learning keras deep-learning resnet

所以我有类似 (25000, 178, 178, 3) 形状的数据,其中我有 25000 个样本,每个样本有 3 个不同的颜色通道(不是 RGB 通道),其中我有大约 21k 个带有标签 0 的样本和其余 4k 作为标签 1。这是我的示例数据之一:

array([[[[1.79844797e-01, 1.73587397e-01, 1.73587397e-01, ...,
          4.84393053e-02, 5.15680127e-02, 5.46967126e-02],
         [1.76716089e-01, 1.79844797e-01, 1.82973504e-01, ...,
          5.15680127e-02, 5.31323589e-02, 5.15680127e-02],
         [1.81409150e-01, 1.86102197e-01, 1.81409150e-01, ...,
          5.15680127e-02, 5.31323589e-02, 5.15680127e-02]]],


       [[[2.51065755e+00, 2.53197193e+00, 2.53197193e+00, ...,
          1.88543844e+00, 1.89964795e+00, 1.90675282e+00],
         [2.51776242e+00, 2.52486706e+00, 2.53197193e+00, ...,
          1.89964795e+00, 1.90675282e+00, 1.90675282e+00],
         [2.53197193e+00, 2.51776242e+00, 2.52486706e+00, ...,
          1.91385746e+00, 1.90675282e+00, 1.90675282e+00]]],


       [[[7.13270283e+00, 7.11016369e+00, 7.13270283e+00, ...,
          4.85625362e+00, 4.90133190e+00, 4.94641018e+00],
         [7.08762503e+00, 7.08762503e+00, 7.08762503e+00, ...,
          4.92387104e+00, 4.96894932e+00, 4.96894932e+00],
         [7.08762503e+00, 7.08762503e+00, 7.06508589e+00, ...,
          4.99148846e+00, 4.96894932e+00, 4.96894932e+00]]],
      dtype=float32)

现在,我首先尝试按颜色通道进行标准化。由于每个颜色通道完全不同,所以我按如下方式按颜色通道标准化,dara_array 是我的整个数据集:

def nan(index):
    data_array[:, :, :, index] = (data_array[:, :, :, index] - np.min(data_array[:, :, :, index]))/(np.max(data_array[:, :, :, index]) - np.min(data_array[:, :, : ,index]))

用于训练、验证和测试的拆分:

rand_indices = np.random.permutation(len(data))
train_indices = rand_indices[0:19000]
valid_indices = rand_indices[19000:21000]
test_indices = rand_indices[21000:len(data)]

x_val = data_array[valid_indices, :]
y_val = EDR[[valid_indices]].astype('float')

x_train = data_array[train_indices, :]
y_train = EDR[[train_indices]].astype('float')

x_test = data_array[test_indices, :]
y_test = EDR[[test_indices]].astype('float')

然后我使用 Imagedatagenerator 来拟合这样的训练数据:

gen = ImageDataGenerator(
            rotation_range=40,
            zoom_range=0.2,
            shear_range=0.2,
            width_shift_range=0.2,
            height_shift_range=0.2,
            fill_mode='nearest',
            horizontal_flip=True,
    )
gen.fit(x_train)

然后我使用RESNET来训练数据如下:

img_height,img_width = 178, 178 
num_classes = 2

base_model = applications.resnet.ResNet101(weights= None, include_top=False, input_shape= (img_height,img_width,3))

x = base_model.output
x = GlobalAveragePooling2D()(x)
x = Dropout(0.7)(x)
predictions = Dense(1, activation= 'sigmoid')(x)
model = Model(inputs = base_model.input, outputs = predictions)

initial_learning_rate = 0.001
def lr_step_decay(epoch, lr):
    drop_rate = 0.5
    epochs_drop = 10.0
    return initial_learning_rate * math.pow(drop_rate, math.floor(epoch/epochs_drop))

sgd = tf.keras.optimizers.SGD(lr = 0.001, momentum = 0.9, decay = 1e-6, nesterov=False)
opt_rms = optimizers.RMSprop(lr=0.001,decay=1e-6)

model.compile(loss = 'binary_crossentropy', optimizer = sgd, metrics = ['accuracy'])
history = model.fit_generator(gen.flow(x_train, y_train, batch_size = 64), 64, epochs = 30, verbose=1, validation_data=(x_val, y_val),
                   callbacks=[LearningRateScheduler(lr_step_decay)])

这是我的模型的训练方式:

Epoch 1/30
64/64 [==============================] - 46s 713ms/step - loss: 0.5535 - accuracy: 0.8364 - val_loss: 6.0887 - val_accuracy: 0.8760 - lr: 0.0010
Epoch 2/30
64/64 [==============================] - 43s 671ms/step - loss: 0.4661 - accuracy: 0.8562 - val_loss: 0.6467 - val_accuracy: 0.8760 - lr: 0.0010
Epoch 3/30
64/64 [==============================] - 43s 673ms/step - loss: 0.4430 - accuracy: 0.8640 - val_loss: 0.4231 - val_accuracy: 0.8760 - lr: 0.0010
Epoch 4/30
64/64 [==============================] - 45s 699ms/step - loss: 0.4327 - accuracy: 0.8674 - val_loss: 0.3895 - val_accuracy: 0.8760 - lr: 0.0010
Epoch 5/30
64/64 [==============================] - 43s 677ms/step - loss: 0.4482 - accuracy: 0.8559 - val_loss: 0.3607 - val_accuracy: 0.8760 - lr: 0.0010
Epoch 6/30
64/64 [==============================] - 43s 678ms/step - loss: 0.3857 - accuracy: 0.8677 - val_loss: 0.4244 - val_accuracy: 0.8760 - lr: 0.0010
Epoch 7/30
64/64 [==============================] - 43s 677ms/step - loss: 0.4308 - accuracy: 0.8623 - val_loss: 0.4049 - val_accuracy: 0.8760 - lr: 0.0010
Epoch 8/30
64/64 [==============================] - 43s 677ms/step - loss: 0.3776 - accuracy: 0.8711 - val_loss: 0.3580 - val_accuracy: 0.8760 - lr: 0.0010
Epoch 9/30
64/64 [==============================] - 43s 677ms/step - loss: 0.4005 - accuracy: 0.8672 - val_loss: 0.3689 - val_accuracy: 0.8760 - lr: 0.0010
Epoch 10/30
64/64 [==============================] - 43s 676ms/step - loss: 0.3977 - accuracy: 0.8828 - val_loss: 0.3513 - val_accuracy: 0.8760 - lr: 0.0010
Epoch 11/30
64/64 [==============================] - 43s 675ms/step - loss: 0.4394 - accuracy: 0.8682 - val_loss: 0.3491 - val_accuracy: 0.8760 - lr: 5.0000e-04
Epoch 12/30
64/64 [==============================] - 43s 676ms/step - loss: 0.3702 - accuracy: 0.8779 - val_loss: 0.3676 - val_accuracy: 0.8760 - lr: 5.0000e-04
Epoch 13/30
64/64 [==============================] - 43s 678ms/step - loss: 0.3904 - accuracy: 0.8706 - val_loss: 0.3621 - val_accuracy: 0.8760 - lr: 5.0000e-04
Epoch 14/30
64/64 [==============================] - 43s 677ms/step - loss: 0.3579 - accuracy: 0.8765 - val_loss: 0.3483 - val_accuracy: 0.8760 - lr: 5.0000e-04

我的验证准确性根本没有改变,它保持不变。并且可能它将所有内容都预测为 0,因为如果它根据拆分将所有内容预测为 0(总共 2k val 记录中有 248 个 1),那么这将是验证数据的准确准确性。有人能告诉我我在这里做错了什么吗?

具有 5 个时间暗淡(我只使用 1 个用于训练)和 1 个数据通道的一个文件的示例图:

enter image description here

1 个答案:

答案 0 :(得分:0)

你的观察确实是正确的:网络没有学习任何东西。

确保正确标记您的数据集 + 正确提供数据。同时,询问并回答以下问题:对于我试图检测的“其他”类来说,178x178 的分辨率是否足够?如果您已经经历了这些过程,请继续执行以下建议。

我会尝试开始将学习率降低到 0.00010.00001(尽管此时学习可能收敛得太慢)。

同时,您是否可以将 Dropout() 完全删除,以查看您的网络是否至少能够学习任何内容。至少在这个调查点 Dropout() 是不需要的,由于使用了高 dropout 值,它实际上阻碍了学习。

相关问题