Question

我用张力流写了一个简单的NN来驱动一个真正的机器人手指我的问题是，经过一个小时的训练后，似乎有一点点了解到了哪个方向，但是当我看到张量板中的权重时，似乎只有两个值得到更新，其他值都保持不变大约零（他们在哪里初始化）？

这是我的代码： https://github.com/flobotics/flobotics_tensorflow_controller/blob/master/nodes/listener.py

损失正在减少，所以看起来很好，即使它不是：）

link to picture of tensorboard weights

编辑：我试图最小化代码，希望它可以吗？

NUM_STATES = 200+200+1024+1024  #200 degree angle_goal, 200 possible degrees the joint could move, 1024 force values, two times
NUM_ACTIONS = 9  #3^2=9      ,one stop-state, one different speed left, one diff.speed right, two servos

session = tf.Session()
build_reward_state()

state = tf.placeholder("float", [None, NUM_STATES])
action = tf.placeholder("float", [None, NUM_ACTIONS])
target = tf.placeholder("float", [None])

Weights = tf.Variable(tf.truncated_normal([NUM_STATES, NUM_ACTIONS], mean=0.1, stddev=0.02, dtype=tf.float32, seed=1), name="Weights")

biases = tf.Variable(tf.zeros([NUM_ACTIONS]), name="biases")

output = tf.matmul(state, Weights) + biases

output1 = tf.nn.relu(output)

readout_action = tf.reduce_sum(tf.mul(output1, action), reduction_indices=1)

loss = tf.reduce_mean(tf.square(target - readout_action))

train_operation = tf.train.AdamOptimizer(0.1).minimize(loss)

session.run(tf.initialize_all_variables())


while 1==1:
if a==0:
    #a==0 is only run once at the beginning, then only a==1,2,3 are running 
    state_from_env = get_current_state()  #we get an array of (1,2448)
    last_action = do nothing #array (1,9), e.g. [0,0,1,0,0,0,0,0,0]
    a=1
if a==1:
    get random action or learned action, array of (1,9)
    run this action (move servo motors)
    save action in last_action
    a=2
if a==2:
    stop servo motors (so the movements are NOT continous)
    a=3
if a==3:
    get_current_state()   #arrray of (1,2448)
    get reward             # one value
    observations.append((last_state, last_action, reward, current_state))

    if training_time:
        get random sample from observations
        agents_reward_per_action = session.run(output, feed_dict={state: current_states})
        agents_expected_reward.append(rewards[i] + FUTURE_REWARD_DISCOUNT * np.max(agents_reward_per_action[i]))
        _, result = session.run([train_operation, merged], feed_dict={state: previous_states, action : actions, target: agents_expected_reward})

    #update values
    last_state = current_state
    a=1

tensorflow权重只有2个值变化？

0 个答案: