具有自定义损失的Keras中的无监督编码

时间:2020-01-23 11:27:37

标签: python tensorflow machine-learning keras

我正在尝试在Keras中使用RNN建模时变协方差,其中将信号Y的协方差分解为时变加权和:C_Y ^ t = SUM_i ^ npriors(alpha_i ^ t * beta_i),其中beta_i是固定的基础集,而alpha_i ^ t是我要推论的术语。

作为成本函数,我(当前)使用负对数似然,其中似然度为零均值MVN,其推论协方差为C_Y ^ t(如上所示):似然度= MVN(Y; 0, C_Y ^ t)。一旦正确实施,我将使用带有KL分歧的reparam技巧。

我不想明确地在经典的自动编码器设置中重建数据-我只想推断最适合随时间变化的协方差动态变化的alpha项。因此,在调用模型时,输出应仅为alpha_mualpha_sigma

alpha_model_net = tf.keras.Model(inputs=[inputs_layer],
                                  outputs= [alpha_mu,alpha_sigma], 
                                  name='Alpha_MODEL')

但是我不知道这些字母项是什么先验,因此在调用alpha_model_net.fit(Y_observed,[alpha_mu_predict,alpha_sigma_predict])时很难知道这些[alpha_mu_predict,alpha_sigma_predict]是什么条款应处于无人监督的状态。

因此,我的问题分为两部分:

  1. 如果我不认识他们,我应该以{{1​​}}的身份来做什么?
  2. 我实际上是在此处所示的尝试实现中,在自定义成本函数中使用了来自alpha分布的样本,即alpha_predict吗?

我自己去实施了。我的代码的关键部分可以在下面看到,a complete example with data simulation can be found on a Google Colab doc here

模型

alpha_ast

费用功能

mini_batch_length = 10 # feature length
nchans = 5 # number of features/channels of observed data, Y
nunits = 10 # number of GRU units
npriors = 2 # i.e. how many basis functions we have

inputs_layer = layers.Input(shape=(mini_batch_length,nchans), name='Y_input')
output,state = tf.compat.v1.keras.layers.CuDNNGRU(nunits, # number of units 
                                          return_state=True,
                                          return_sequences=True,
                                          name='uni_INF_GRU')(inputs_layer)

alpha_mu = tf.keras.layers.Dense(npriors,activation='linear',name='alpha_mu')(output)                                  
alpha_sigma = tf.keras.layers.Dense(npriors,activation='linear',name='alpha_sigma')(output)                                  

# use reparameterization trick to push the sampling out as input
alpha_ast = layers.Lambda(sampling, 
                          name='alpha_ast')([alpha_mu, alpha_sigma])

# instantiate alpha MODEL network:
alpha_model_net = tf.keras.Model(inputs=[inputs_layer],
                                  outputs= [alpha_ast], 
                                  name='Alpha_MODEL')

tf.keras.utils.plot_model(alpha_model_net, to_file='vae_mlp_encoder.png', show_shapes=True)

合适的型号

def vae_loss(Y_portioned, alpha_ast):
  """
  Our cost function is just the NLL

  The likelihood is a multivariate normal with zero mean and time-varying
  covariance:
                  P(Y|alpha^t) = MVN(Y; 0, C_Y^t)
  where
                      C_Y^t  = SUM_i^npriors (alpha_ast_i^t beta_i)

  Y is our observed data
  alpha_ast_i^t are our samples from the inferred parameters (mu,sigma)
  beta_i are the basis functions (corresponding to covariance_matrix below)
  and (perhaps obviously) are not trainable.                                    
  """
  # Alphas need to end up being of dimension (?,mini_batch_length,npriors,1,1),
  # and need to undergo softplus transformation:
  alpha_ext = tf.keras.backend.expand_dims(tf.keras.backend.expand_dims(
    tf.keras.activations.softplus(alpha_ast),
    axis=-1),axis=-1)

  # Covariance basis set
  # This needs to be of dim [npriors, sensors, sensors]:
  covariance_basis = np.tile(np.zeros((nchans,nchans)),(npriors,1,1)).astype('float32')
  covariance_basis[0,0,0] = 1
  covariance_basis[1,1,1] = 1

  # Covariance basis functions need to be of dimension [1,1, npriors, sensors, sensors]
  covariance_ext = tf.reshape(covariance_basis,(1,1,npriors,nchans,nchans))

  # Do the multiplicative sum over the npriors dimension:
  cov_arg = tf.reduce_sum(tf.multiply(alpha_ext,covariance_ext),2)
  safety_add = 1e-6*np.eye(nchans, nchans) 
  cov_arg = cov_arg + safety_add 

  mvn=tfd.MultivariateNormalFullCovariance(
  loc = np.zeros((mini_batch_length,nchans)).astype('float32'), 
  covariance_matrix=cov_arg,
  allow_nan_stats=False)

  # Evaluate the -log(MVN) at the current batch of data. We add a tiny constant
  # to avoid any NaN or inf troubles
  loss = tf.reduce_sum(-tf.math.log(mvn.prob(Y_portioned)+1e-9))

  return loss

在此先非常感谢-如果我错过任何关键细节,请告诉我。

使用TensorFlow 2.1.0后端。

更新1: 我只是使用opt = tf.keras.optimizers.Adam(lr=0.001) alpha_model_net.compile(optimizer=opt, loss=vae_loss) history=alpha_model_net.fit(Y_portioned, # Observed data. Y_portioned, # ??? verbose=1, shuffle=True, epochs=100, batch_size=400) 函数来计算带有张量的NLL。现在似乎可以正常工作了,我不需要在model.fit(x,y)中指定有害的y。如果不正确,将再次更新。

示例模型

add_loss

其中

inputs_layer = layers.Input(shape=(mini_batch_length,nchans), name='Y_portioned_in')
output,state = tf.compat.v1.keras.layers.CuDNNGRU(nunits, # number of units 
                                          return_state=True,
                                          return_sequences=True,
                                          name='uni_INF_GRU')(inputs_layer)

dense_layer_mu = tf.keras.layers.Dense(npriors,activation='linear')(output)                                  
dense_layer_sigma = tf.keras.layers.Dense(npriors,activation='linear')(output)                                  

alpha_ast = layers.Lambda(sampling, 
                          name='alpha_ast')([dense_layer_mu, dense_layer_sigma])

model = tf.keras.Model(inputs=[inputs_layer], outputs=[dense_layer_mu])

# Construct your custom loss as a tensor
loss = my_beautiful_custom_loss(alpha_ast,inputs_layer,npriors,nchans)

# Add loss to model
model.add_loss(loss)

# Compile without specifying a loss
opt = tf.keras.optimizers.Adam(lr=0.001)
model.compile(optimizer=opt)

history=model.fit(Y_portioned, # Input or "Y_true"
                    verbose=1,
                    shuffle=True,
                    epochs=400,
                    batch_size=200)

1 个答案:

答案 0 :(得分:0)

不确定这是最明智的做法,但我使用了add_loss函数来解决此问题。

我将通过完整的实施方式更新我的原始问题。