Question

我是TensorFlow的新手，很难理解计算的工作原理。我无法在网上找到我的问题的答案。

对于下面这段代码，我最后一次在“train_neural_net（）”函数的for循环中打印“d”时，我希望这些值与我打印“test_distance.eval”时的值相同。但他们的方式不同。谁能告诉我为什么会这样？是不是TensorFlow应该缓存在for循环中学习的变量结果并在运行“test_distance.eval”时使用它们？

def neural_network_model1(data):
    nn1_hidden_1_layer = {'weights': tf.Variable(tf.random_normal([5, n_nodes_hl1])), 'biasses': tf.Variable(tf.random_normal([n_nodes_hl1]))}
    nn1_hidden_2_layer = {'weights': tf.Variable(tf.random_normal([n_nodes_hl1, n_nodes_hl2])), 'biasses': tf.Variable(tf.random_normal([n_nodes_hl2]))}
    nn1_output_layer = {'weights': tf.Variable(tf.random_normal([n_nodes_hl2, vector_size])), 'biasses': tf.Variable(tf.random_normal([vector_size]))}

    nn1_l1 = tf.add(tf.matmul(data, nn1_hidden_1_layer["weights"]), nn1_hidden_1_layer["biasses"])
    nn1_l1 = tf.sigmoid(nn1_l1)

    nn1_l2 = tf.add(tf.matmul(nn1_l1, nn1_hidden_2_layer["weights"]), nn1_hidden_2_layer["biasses"])
    nn1_l2 = tf.sigmoid(nn1_l2)

    nn1_output = tf.add(tf.matmul(nn1_l2, nn1_output_layer["weights"]), nn1_output_layer["biasses"])

    return nn1_output

def neural_network_model2(data):
    nn2_hidden_1_layer = {'weights': tf.Variable(tf.random_normal([5, n_nodes_hl1])), 'biasses': tf.Variable(tf.random_normal([n_nodes_hl1]))}
    nn2_hidden_2_layer = {'weights': tf.Variable(tf.random_normal([n_nodes_hl1, n_nodes_hl2])), 'biasses': tf.Variable(tf.random_normal([n_nodes_hl2]))}
    nn2_output_layer = {'weights': tf.Variable(tf.random_normal([n_nodes_hl2, vector_size])), 'biasses': tf.Variable(tf.random_normal([vector_size]))}

    nn2_l1 = tf.add(tf.matmul(data, nn2_hidden_1_layer["weights"]), nn2_hidden_1_layer["biasses"])
    nn2_l1 = tf.sigmoid(nn2_l1)

    nn2_l2 = tf.add(tf.matmul(nn2_l1, nn2_hidden_2_layer["weights"]), nn2_hidden_2_layer["biasses"])
    nn2_l2 = tf.sigmoid(nn2_l2)

    nn2_output = tf.add(tf.matmul(nn2_l2, nn2_output_layer["weights"]), nn2_output_layer["biasses"])

    return nn2_output

def train_neural_net():
    prediction1 = neural_network_model1(x1)
    prediction2 = neural_network_model2(x2)

    distance = tf.sqrt(tf.reduce_sum(tf.square(tf.subtract(prediction1, prediction2)), reduction_indices=1))
    cost = tf.reduce_mean(tf.multiply(y, distance))
    optimizer = tf.train.AdamOptimizer().minimize(cost)

    hm_epochs = 500

    test_result1 = neural_network_model1(x3)
    test_result2 = neural_network_model2(x4)
    test_distance = tf.sqrt(tf.reduce_sum(tf.square(tf.subtract(test_result1, test_result2)), reduction_indices=1))

    with tf.Session() as sess:
        sess.run(tf.global_variables_initializer())

        for epoch in range(hm_epochs):
            _, d = sess.run([optimizer, distance], feed_dict = {x1: train_x1, x2: train_x2, y: train_y})
            print("Epoch", epoch, "distance", d)

        print("test distance", test_distance.eval({x3: train_x1, x4: train_x2}))

train_neural_net()

Answer 1

每次调用函数neural_network_model1()或neural_network_model2()时，都会创建一组新变量，因此总共有四组变量。

对sess.run(tf.global_variables_initializer())的调用会初始化所有四组变量。
当您在for循环中训练时，您只更新使用这些行创建的前两组变量：
```
prediction1 = neural_network_model1(x1)
prediction2 = neural_network_model2(x2)
```
使用test_distance.eval()进行评估时，张量test_distance仅取决于在最后两组变量中创建的变量，这些变量是使用以下行创建的：
```
test_result1 = neural_network_model1(x3)
test_result2 = neural_network_model2(x4)
```
这些变量从未在训练循环中更新，因此评估结果将基于随机初始值。

TensorFlow包含一些代码，用于使用with tf.variable_scope(...):块在多个同一函数调用之间共享权重。有关如何使用这些内容的更多信息，请参阅TensorFlow网站上的tutorial on variables and sharing。

Answer 2

您不需要为生成模型定义两个函数，您可以使用tf.name_scope，并将模型名称传递给函数，以将其用作变量声明的前缀。另一方面，您为距离定义了两个变量，第一个是distance，第二个是test_distance。但是您的模型将从列车数据中学习以最小化cost，其仅与第一距离变量相关。因此，永远不会使用test_distance，与之相关的模型永远不会学到任何东西！同样，不需要两个距离函数。你只需要一个。当您想要计算列车距离时，您应该用火车数据提供，当您想要计算测试距离时，您应该提供它带有测试数据。无论如何，如果你想要第二个距离工作，你应该为它宣布另一个optimizer，你也必须像第一个那样学习它。此外，您应该考虑模型基于其初始值和训练数据进行学习的事实。即使您为两个模型提供完全相同的训练批次，也不能指望具有完全相似的特征模型，因为权重的初始值不同，这可能导致落入不同的局部最小误差表面。最后请注意，无论何时拨打neural_network_model1或neural_network_model2，您都会生成新的权重和偏差，因为tf.Variable正在为您生成新的变量。

难以理解TensorFlow计算

2 个答案: