我用张力流写了一个简单的NN来驱动一个真正的机器人手指 我的问题是,经过一个小时的训练后,似乎有一点点了解到了哪个方向,但是当我看到张量板中的权重时,似乎只有两个值得到更新,其他值都保持不变大约零(他们在哪里初始化)?
这是我的代码: https://github.com/flobotics/flobotics_tensorflow_controller/blob/master/nodes/listener.py
损失正在减少,所以看起来很好,即使它不是:)
编辑: 我试图最小化代码,希望它可以吗?
NUM_STATES = 200+200+1024+1024 #200 degree angle_goal, 200 possible degrees the joint could move, 1024 force values, two times
NUM_ACTIONS = 9 #3^2=9 ,one stop-state, one different speed left, one diff.speed right, two servos
session = tf.Session()
build_reward_state()
state = tf.placeholder("float", [None, NUM_STATES])
action = tf.placeholder("float", [None, NUM_ACTIONS])
target = tf.placeholder("float", [None])
Weights = tf.Variable(tf.truncated_normal([NUM_STATES, NUM_ACTIONS], mean=0.1, stddev=0.02, dtype=tf.float32, seed=1), name="Weights")
biases = tf.Variable(tf.zeros([NUM_ACTIONS]), name="biases")
output = tf.matmul(state, Weights) + biases
output1 = tf.nn.relu(output)
readout_action = tf.reduce_sum(tf.mul(output1, action), reduction_indices=1)
loss = tf.reduce_mean(tf.square(target - readout_action))
train_operation = tf.train.AdamOptimizer(0.1).minimize(loss)
session.run(tf.initialize_all_variables())
while 1==1:
if a==0:
#a==0 is only run once at the beginning, then only a==1,2,3 are running
state_from_env = get_current_state() #we get an array of (1,2448)
last_action = do nothing #array (1,9), e.g. [0,0,1,0,0,0,0,0,0]
a=1
if a==1:
get random action or learned action, array of (1,9)
run this action (move servo motors)
save action in last_action
a=2
if a==2:
stop servo motors (so the movements are NOT continous)
a=3
if a==3:
get_current_state() #arrray of (1,2448)
get reward # one value
observations.append((last_state, last_action, reward, current_state))
if training_time:
get random sample from observations
agents_reward_per_action = session.run(output, feed_dict={state: current_states})
agents_expected_reward.append(rewards[i] + FUTURE_REWARD_DISCOUNT * np.max(agents_reward_per_action[i]))
_, result = session.run([train_operation, merged], feed_dict={state: previous_states, action : actions, target: agents_expected_reward})
#update values
last_state = current_state
a=1