keras:记录分层学习率

时间:2018-11-05 19:21:24

标签: python tensorflow keras keras-layer

我正在用Keras实现一个简单的CNN,并试图在Adam中设置分层学习率。我参考了this tutorial。修改后的亚当,如下所示:

class Adam_lr_mult(Optimizer): 
    def __init__(self, lr=0.001, beta_1=0.9, beta_2=0.999,
                 epsilon=None, decay=0., amsgrad=False,
                 multipliers=None, debug_verbose=True,**kwargs):
        ...'''Omitted'''

        self.multipliers = multipliers
        self.layerwise_lr={}    # record layer-wise lr
        self.debug_verbose = debug_verbose

    @interfaces.legacy_get_updates_support
    def get_updates(self, loss, params):
        grads = self.get_gradients(loss, params)
        self.updates = [K.update_add(self.iterations, 1)]

        lr = self.lr
        if self.initial_decay > 0:
        lr *= (1. / (1. + self.decay * K.cast(self.iterations,
                                              K.dtype(self.decay))))

        t = K.cast(self.iterations, K.floatx()) + 1
        lr_t = lr * (K.sqrt(1. - K.pow(self.beta_2, t)) /
                     (1. - K.pow(self.beta_1, t)))

        ms = [K.zeros(K.int_shape(p), dtype=K.dtype(p)) for p in params]
        vs = [K.zeros(K.int_shape(p), dtype=K.dtype(p)) for p in params]
        if self.amsgrad:
            vhats = [K.zeros(K.int_shape(p), dtype=K.dtype(p)) for p in params]
        else:
            vhats = [K.zeros(1) for _ in params]

        self.weights = [self.iterations] + ms + vs + vhats

        for p, g, m, v, vhat in zip(params, grads, ms, vs, vhats):

            # Learning rate multipliers
            if self.multipliers:
                multiplier = [mult for mult in self.multipliers if mult in p.name]
                if self.debug_verbose:
                    print('parameter: ',p.name)
            else:
                multiplier = None

            if multiplier:
                new_lr_t = lr_t * self.multipliers[multiplier[0]]
                self.layerwise_lr[multiplier[0]] = K.get_value(new_lr_t)

                if self.debug_verbose:
                    print('Setting {} to learning rate : {}'.format(multiplier[0], new_lr_t))
                    print('learning rate:',K.get_value(new_lr_t))
                    print('Dict:',self.layerwise_lr)
                    print('\n')
            else:
                new_lr_t = lr_t
                self.layerwise_lr[p.name.split('/')[0]] = K.get_value(new_lr_t)

                if self.debug_verbose:
                    print('No change in learning rate : {}'.format(p.name))
                    print('learning rate:',K.get_value(new_lr_t))
                    print('Dict:',self.layerwise_lr)
                    print('\n')

            ...'''Omitted'''

        print(***__Hello__***)
        return self.updates

此外,我使用ReduceLROnPlateau和CSVLogger回调函数来记录学习率。为了记录有关逐层学习率的更多信息,我还修改了ReduceLROnPlateau:

class ReduceLROnPlateau_lr_mult(Callback):
    def __init__(self, monitor='val_loss', factor=0.1, patience=10,
             verbose=0, mode='auto', min_delta=1e-4, cooldown=0, min_lr=0,
             **kwargs):
        ...'''Omitted'''

    def on_epoch_end(self, epoch, logs=None):
        logs = logs or {}
        logs['lr'] = K.get_value(self.model.optimizer.lr)
        logs.update(self.model.optimizer.layerwise_lr) # only add this line
        ...'''Omitted'''

为了测试修订的Adam和ReduceLROnPlateau函数,我使用MNIST数据集并构建了一个仅包含4个卷积层,4个batch_normalization层和1个密集层的简单CNN,代码和结果如下所示:

# Learning multiplier
lr_multipliers = {}
lr_multipliers['conv2d_1'] = 0.8
lr_multipliers['batch_normalization_1'] = 0.8
lr_multipliers['conv2d_2'] = 0.6
lr_multipliers['conv2d_3'] = 0.4
lr_multipliers['conv2d_4'] = 0.2

# Adam with layer-wise lr
adam_lr_mult = Adam_lr_mult(multipliers=lr_multipliers)

# ReduceLROnPlateau with layer-wise lr dict
lr_reducer_mult = ReduceLROnPlateau_lr_mult(monitor='val_loss',factor=np.sqrt(0.1), 
                      cooldown=0, patience=5, min_lr=0.00001, mode='auto', verbose=1)

model.compile(loss='categorical_crossentropy', optimizer=adam_lr_mult, metrics=['accuracy'])

model.fit(...,verbose=2,
             callbacks=[lr_reducer_mult, early_stopper, csv_logger, ...])

结果显示:

...'''Omitted'''
parameter:  batch_normalization_4/beta:0
No change in learning rate : batch_normalization_4/beta:0
learning rate: 0.00031623512
Dict: {'conv2d_1': 0.00025298807, 'batch_normalization_1': 0.00025298807,...}

parameter:  dense_1/kernel:0
No change in learning rate : dense_1/kernel:0
learning rate: 0.00031623512
Dict: {'conv2d_1': 0.00025298807, 'batch_normalization_1': 0.00025298807,...}

parameter:  dense_1/bias:0
No change in learning rate : dense_1/bias:0
learning rate: 0.00031623512
Dict: {'conv2d_1': 0.00025298807, 'batch_normalization_1': 0.00025298807,...}

***__Hello__***
Train on 48000 samples, validate on 12000 samples
Epoch 1/100
 - 4s - loss: 0.4517 - acc: 0.9098 - val_loss: 0.2027 - val_acc: 0.9572
Epoch 2/100
 - 2s - loss: 0.1029 - acc: 0.9827 - val_loss: 0.1374 - val_acc: 0.9718
Epoch 3/100
 - 2s - loss: 0.0739 - acc: 0.9905 - val_loss: 0.0929 - val_acc: 0.9833
Epoch 4/100
 - 2s - loss: 0.0604 - acc: 0.9939 - val_loss: 0.0815 - val_acc: 0.9865
Epoch 5/100
 - 2s - loss: 0.0513 - acc: 0.9959 - val_loss: 0.0785 - val_acc: 0.9864
Epoch 6/100
 - 2s - loss: 0.0448 - acc: 0.9979 - val_loss: 0.1081 - val_acc: 0.9759
Epoch 7/100
 - 2s - loss: 0.0405 - acc: 0.9984 - val_loss: 0.0752 - val_acc: 0.9864
Epoch 8/100
 - 2s - loss: 0.0368 - acc: 0.9990 - val_loss: 0.1382 - val_acc: 0.9666
Epoch 9/100
 - 2s - loss: 0.0337 - acc: 0.9996 - val_loss: 0.0659 - val_acc: 0.9890
Epoch 10/100
 - 2s - loss: 0.0314 - acc: 0.9998 - val_loss: 0.0746 - val_acc: 0.9860
...

Epoch 25/100
 - 2s - loss: 0.0177 - acc: 0.9991 - val_loss: 0.1212 - val_acc: 0.9731

Epoch 00025: ReduceLROnPlateau reducing learning rate to 0.00031622778103685084.
Epoch 26/100
 - 2s - loss: 0.0155 - acc: 0.9998 - val_loss: 0.0446 - val_acc: 0.9915
Epoch 27/100
 - 2s - loss: 0.0146 - acc: 1.0000 - val_loss: 0.0422 - val_acc: 0.9926

我的问题:

我不确定我要去哪里,logs['lr']CSV file中有变化,但dictionary "layerwise_lr"没有。为了找出问题,我加了一行 print(***__Hello__***)在亚当中,它只出现一次。这让我感到困惑,有关设置学习率的信息仅在第一个时期之前出现,而不再出现。有人可以给我一些建议吗?非常感谢!

0 个答案:

没有答案