Tensorflow客户化培训-ValueError:变量密/内核/亚当/不存在?

时间:2019-05-15 15:00:07

标签: python tensorflow

我尝试构建一个非常简单的NN模型来执行分类任务。我尝试用客户化的方式建立模型。我想使用tf.Data.Dataset加载我的数据集。然后,我用mini-batch的方式训练模型。同时,我想在验证数据集上打印模型结果。因此,我尝试重用变量。我的模型如下:

def get_loss(prediction, label):
    return tf.losses.softmax_cross_entropy(tf.expand_dims(label, -1), prediction)


def make_train_op(optimizer, loss):
    apply_gradient_op = optimizer.minimize(loss,)
    return apply_gradient_op

class Model:

    def __init__(self):
        self.model = tf.keras.Sequential([
            tf.keras.layers.Dense(32, input_shape=(3,), activation=tf.keras.activations.relu),
            tf.keras.layers.Dense(128, input_shape=(64,), activation=tf.keras.activations.relu),
            tf.keras.layers.Dense(1, input_shape=(128,), activation=tf.keras.activations.softmax)
    ])

    def __call__(self, inp, is_train=True):
        return self.model(inp.feature), inp.label

然后我尝试如下训练该模型:

model = Model()
optimizer = tf.train.AdamOptimizer()
init = tf.global_variables_initializer()
global_step = tf.train.get_or_create_global_step()

with tf.variable_scope('input', reuse=True):
    training_inp = InputPipe()
    validate_inp = InputPipe(is_train=False)

scope = tf.get_variable_scope()
training_prediction, true_train_y = model(training_inp)
scope.reuse_variables()

total_instances = data_size * n_repeats
steps_per_epoch = data_size // batch_size if data_size / batch_size == 0 else data_size // batch_size + 1

with tf.Session() as sess:
    sess.run(init)
    training_inp.init_sess(sess)
    list_grads = []
    for epoch in range(n_repeats):
        tqr = range(steps_per_epoch)

        for _ in tqr:
            loss = get_loss(training_prediction, true_train_y)
            sess.run(make_train_op(optimizer, loss))

但是,optimizer.minize(loss)引发异常:

  

ValueError:变量密/内核/亚当/不存在,或者不是使用tf.get_variable()创建的。您是要在VarScope中设置复用= tf.AUTO_REUSE吗?

更新:

当我循环外调用get_lossmake_train_op时。它引发了关于FailedPreconditionError的另一个错误,但是,我已经初始化了所有变量:

  

FailedPreconditionError(请参阅上面的回溯):

     

从容器:本地主机读取资源变量beta2_power时出错。这可能意味着该变量未初始化。找不到:资源localhost / beta2_power / class tensorflow :: Var不存在。        [[节点Adam / update_dense_2 / kernel / ResourceApplyAdam / ReadVariableOp_1

     

(在D:/ 00程序/python_ai/model/traffic_prediction_1/trainer_test_1.py:16处定义)]]

第16行是:

apply_gradient_op = optimizer.minimize(loss, )

1 个答案:

答案 0 :(得分:1)

我认为问题在于,您正在循环中调用get_lossmake_train_op,这会造成多个损失和优化操作。改为这样做:

model = Model()
optimizer = tf.train.AdamOptimizer()
init = tf.global_variables_initializer()
global_step = tf.train.get_or_create_global_step()

with tf.variable_scope('input', reuse=True):
    training_inp = InputPipe()
    validate_inp = InputPipe(is_train=False)

training_prediction, true_train_y = model(training_inp)
loss = get_loss(training_prediction, true_train_y)
train_op = make_train_op(optimizer, loss)

total_instances = data_size * n_repeats
steps_per_epoch = data_size // batch_size if data_size / batch_size == 0 else data_size // batch_size + 1

with tf.Session() as sess:
    sess.run(init)
    training_inp.init_sess(sess)
    list_grads = []
    for epoch in range(n_repeats):
        tqr = range(steps_per_epoch)
        for _ in tqr:
            sess.run(train_op)