Python Tensorflow DQN后续步骤

时间:2019-03-23 14:05:34

标签: python tensorflow neural-network reinforcement-learning q-learning

我无法确定我的Deep Q网络的下一步。我正在尝试优化公交路线。我有一个距离矩阵和停止流行度的数据。


distance=np.array[[0, stop1-stop2, stop1-stop3, stop1-stop4],
                 [stop2-stop1, 0, stop2-stop3, stop2-stop4],
                 [stop3-stop1, stop3-stop2, 0, stop3-stop4],
                 [stop4-stop1, stop4-stop2, stop4-stop3, 0]]


(1/distance) * (percent of total riders who get on and off at specific stop)




    import tensorflow as tf

    # Current game states. Rows of the rewards matrix corresponding to   the agent's current stop. Inputs to neural network.
    observations = tf.placeholder('float32', shape=[None, num_stops])

    # Actions. A number from 0-number of stops, denoting which stop the agent traveled to from its current location.
    actions = tf.placeholder('int32',shape=[None])

    # These are the rewards received by the agent for making its decisions. +1 if agent 'wins' the game (gets system score to 0 (this will only happen if bus stops are not updated periodically))
    rewards = tf.placeholder('float32',shape=[None])  # +1, -1 with discounts

# Model

    # This is first layer of neural network, takes the observations tensor as input and has '200' hidden layers. This number is arbitrary, I'm not sure how to adjust it for peak performance.
    Y = tf.layers.dense(observations, 200, activation=tf.nn.relu)



0 个答案:
