Question

我正在使用Google Colab和Tensorflow一起玩。我正在尝试评估手工创建的简单感知器，而不是将Keras与急切的执行模式结合使用。

感知器期望输入（1x2）张量，共有两层，包括以下权重和偏差W1:(2x2) / B1:(1x2)和W2:(2x1) / B2:(1x1)，

我发现这条简单的代码毫无道理地失败了。似乎与优化程序有关，我尝试过的每个优化程序都因不同的错误而失败。例如，对于下面使用的优化器（GradientDescentOptimizer），Tensorflow表示未实现该操作，我不知道为什么。这是一段自给自足的代码（Tensorflow 1.13.1 / Python3）：

import numpy as np
import tensorflow as tf
import tensorflow.contrib.eager as tfe

tf.enable_eager_execution()
with tf.device("GPU:0"):
  W1 = tf.random_uniform([2, 2], -1, 1, tf.float32)
  B1 = tf.random_uniform([1, 2], -1, 1, tf.float32)

  W2 = tf.random_uniform([2, 1], -1, 1, tf.float32)
  B2 = tf.random_uniform([1, 1], -1, 1, tf.float32)

  X0 = tf.convert_to_tensor(np.array([[0, 0]]), tf.float32)

  with tf.GradientTape() as tape:
    tape.watch(W1)
    tape.watch(B1)
    tape.watch(W2)
    tape.watch(B2)

    X1 = tf.sigmoid(tf.matmul(X0, W1) + B1)
    X2 = tf.sigmoid(tf.matmul(X1, W2) + B2)

    Loss = tf.square(X2 - tf.constant([[1]], tf.float32))

  dLoss_dParams = tape.gradient(Loss, [W1, B1, W2, B2])  

optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.01)
optimizer.apply_gradients(zip(dLoss_dParams, [W1, B1, W2, B2]), tf.Variable(0))

我在做什么错了？

提前谢谢！

Answer 1

好，以防万一其他人遇到同样的问题。按照@jdehesa在评论中的回答，结果代码如下所示（我已经更新了原始代码，现在感知器正在尝试解决xor问题）：

import numpy as np
import tensorflow as tf
import tensorflow.contrib.eager as tfe

tf.enable_eager_execution()

optimizer = tf.train.AdamOptimizer()

with tf.device("GPU:0"):
  X0 = np.array([[0, 0], [0, 1], [1, 0], [1, 1]], np.float32) # 4x2

  W1 = tf.Variable(tf.random_uniform([2, 2], -1.0, 1.0, tf.float32)) # 4x2 * 2x2 => 4x2
  B1 = tf.Variable(tf.random_uniform([1, 2], -1.0, 1.0, tf.float32)) # 4x2 + 1x2 => 4x2

  W2 = tf.Variable(tf.random_uniform([2, 1], -1.0, 1.0, tf.float32)) # 4x2 * 2x1 => 4x1
  B2 = tf.Variable(tf.random_uniform([1, 1], -1.0, 1.0, tf.float32)) # 4x1 + 1x1 => 4x1

  with tf.GradientTape() as tape:
    #     tape.watch(W1)
    #     tape.watch(B1)
    #     tape.watch(W2)
    #     tape.watch(B2)

    X1 = tf.tanh(tf.matmul(X0, W1) + B1)
    X2 = tf.tanh(tf.matmul(X1, W2) + B2)

    Loss = tf.square(X2 - tf.constant([[0], [1], [1], [0]], tf.float32))

  dLoss_dParams = tape.gradient(Loss, [W1, B1, W2, B2])
  optimizer.apply_gradients(zip(dLoss_dParams, [W1, B1, W2, B2]))
  print(Loss.numpy()[0][0]) 

for i in range(10000):
  with tf.GradientTape() as tape:
    X1 = tf.tanh(tf.matmul(X0, W1) + B1)
    X2 = tf.tanh(tf.matmul(X1, W2) + B2)

    Loss = tf.reduce_mean(tf.square(X2 - tf.constant([[0], [1], [1], [0]], tf.float32)))

  dLoss_dParams = tape.gradient(Loss, [W1, B1, W2, B2])  
  optimizer.apply_gradients(zip(dLoss_dParams, [W1, B1, W2, B2]))

  if i % 1000 == 0:
    print(Loss.numpy())

X1 = tf.tanh(tf.matmul(X0, W1) + B1)
X2 = tf.tanh(tf.matmul(X1, W2) + B2)

print(X2.numpy()[0][0])
print(X2.numpy()[1][0])
print(X2.numpy()[2][0])
print(X2.numpy()[3][0])

Tensorflow在惯用的简单代码上失败了

1 个答案: