Question

我试图让我自己的数据与theano / lasagne中的卷积神经网络一起工作。

状态由4个80x80图像组成。批处理中有32个状态，批处理是神经网络的输入。网络输出有5个单位。（与游戏中的5种可能行为有关）也见编辑2

在训练模型之前，程序会观察状态。在测试训练功能时，我将观察结果设置为仅500，一切都很好。但是当我将观察结果设置为50 000时，我突然得到了这个错误：

Traceback (most recent call last):
File "snake_player.py", line 320, in <module>
  import snake
File "C:\A Bright Future\machine-learning\snake\PyGamePlayer-master\examples\snake.py", line 124, in <module>
  pygame.display.update()
File "C:\A Bright Future\machine-learning\snake\PyGamePlayer-master\examples\pygame_player.py", line 28, in wrap
  intercepted_results = intercepting_func(real_results, *args, **kwargs)  # call our own function a
File "C:\A Bright Future\machine-learning\snake\PyGamePlayer-master\examples\pygame_player.py", line 149, in _on_screen_update
  keys = self.get_keys_pressed(surface_array, reward, terminal)
File "snake_player.py", line 126, in get_keys_pressed
  self._train()
File "snake_player.py", line 237, in _train
  self._train_err += train_fn(previous_states, agents_expected_reward)
File "C:\Anaconda2\lib\site-packages\theano\compile\function_module.py", line 871, in __call__
  storage_map=getattr(self.fn, 'storage_map', None))
File "C:\Anaconda2\lib\site-packages\theano\gof\link.py", line 314, in raise_with_op
  reraise(exc_type, exc_value, exc_trace)
File "C:\Anaconda2\lib\site-packages\theano\compile\function_module.py", line 859, in __call__
  outputs = self.fn()
ValueError: y_i value out of bounds
Apply node that caused the error: CrossentropySoftmaxArgmax1HotWithBias(Dot22.0, b, targets)
Toposort index: 50
Inputs types: [TensorType(float64, matrix), TensorType(float64, vector), TensorType(int32, vector)]
Inputs shapes: [(32L, 5L), (5L,), (32L,)]
Inputs strides: [(40L, 8L), (8L,), (4L,)]
Inputs values: ['not shown', array([ 0.1,  0.1,  0.1,  0.1,  0.1]), 'not shown']
Outputs clients: [[Sum{acc_dtype=float64}(CrossentropySoftmaxArgmax1HotWithBias.0)], [CrossentropySoftmax1HotWithBiasDx(TensorConstant{(32L,) of 0.03125}, CrossentropySoftmaxArgmax1HotWithBias.1, targets)], []]

apply节点的Debugprint：

CrossentropySoftmaxArgmax1HotWithBias.0 [id A] <TensorType(float64, vector)> ''
 |Dot22 [id B] <TensorType(float64, matrix)> ''
 | |Elemwise{Composite{(i0 * (Abs(i1) + i2 + i3))}}[(0, 2)] [id C] <TensorType(float64, matrix)> ''
 | | |TensorConstant{(1L, 1L) of 0.5} [id D] <TensorType(float64, (True, True))>
 | | |Elemwise{add,no_inplace} [id E] <TensorType(float64, matrix)> ''
 | | | |Dot22 [id F] <TensorType(float64, matrix)> ''
 | | | | |Reshape{2} [id G] <TensorType(float64, matrix)> ''
 | | | | | |Elemwise{Composite{(i0 * (Abs(i1) + i2 + i3))}}[(0, 2)] [id H] <TensorType(float64, 4D)> ''
 | | | | | | |TensorConstant{(1L, 1L, 1..1L) of 0.5} [id I] <TensorType(float64, (True, True, True, True))>
 | | | | | | |Elemwise{add,no_inplace} [id J] <TensorType(float64, 4D)> ''
 | | | | | | | |ConvOp{('imshp', (64, 8, 8)),('kshp', (3, 3)),('nkern', 64),('bsize', None),('dx', 1),('dy', 1),('out_mode', 'valid'),('unroll_batch', None),('unroll_kern', None),('unroll_patch', True),('imshp_logical', (64, 8, 8)),('kshp_logical', (3, 3)),('kshp_logical_top_aligned', True)} [id K] <TensorType(float64, 4D)> ''
 | | | | | | | | |Elemwise{Composite{(i0 * (Abs(i1) + i2 + i3))}}[(0, 2)] [id L] <TensorType(float64, 4D)> ''
 | | | | | | | | | |TensorConstant{(1L, 1L, 1..1L) of 0.5} [id I] <TensorType(float64, (True, True, True, True))>
 | | | | | | | | | |Elemwise{add,no_inplace} [id M] <TensorType(float64, 4D)> ''
 | | | | | | | | | | |ConvOp{('imshp', (32, 19, 19)),('kshp', (4, 4)),('nkern', 64),('bsize', None),('dx', 2),('dy', 2),('out_mode', 'valid'),('unroll_batch', None),('unroll_kern', None),('unroll_patch', True),('imshp_logical', (32, 19, 19)),('kshp_logical', (4, 4)),('kshp_logical_top_aligned', True)} [id N] <TensorType(float64, 4D)> ''
 | | | | | | | | | | | |Elemwise{Composite{(i0 * (Abs(i1) + i2 + i3))}}[(0, 2)] [id O] <TensorType(float64, 4D)> ''
 | | | | | | | | | | | | |TensorConstant{(1L, 1L, 1..1L) of 0.5} [id I] <TensorType(float64, (True, True, True, True))>
 | | | | | | | | | | | | |Elemwise{add,no_inplace} [id P] <TensorType(float64, 4D)> ''
 | | | | | | | | | | | | | |ConvOp{('imshp', (4, 80, 80)),('kshp', (8, 8)),('nkern', 32),('bsize', None),('dx', 4),('dy', 4),('out_mode', 'valid'),('unroll_batch', None),('unroll_kern', None),('unroll_patch', True),('imshp_logical', (4, 80, 80)),('kshp_logical', (8, 8)),('kshp_logical_top_aligned', True)} [id Q] <TensorType(float64, 4D)> '
 | | | | | | | | | | | | | | |TensorConstant{[[[[ 1.   ..      ]]]]} [id R] <TensorType(float64, 4D)>
 | | | | | | | | | | | | | | |W [id S] <TensorType(float64, 4D)>
 | | | | | | | | | | | | | |InplaceDimShuffle{x,0,x,x} [id T] <TensorType(float64, (True, False, True, True))> ''
 | | | | | | | | | | | | |   |b [id U] <TensorType(float64, vector)>
 | | | | | | | | | | | | |ConvOp{('imshp', (4, 80, 80)),('kshp', (8, 8)),('nkern', 32),('bsize', None),('dx', 4),('dy', 4),('out_mode', 'valid'),('unroll_batch', None),('unroll_kern', None),('unroll_patch', True),('imshp_logical', (4, 80, 80)),('kshp_logical', (8, 8)),('kshp_logical_top_aligned', True)} [id Q] <TensorType(float64, 4D)> ''
 | | | | | | | | | | | | |InplaceDimShuffle{x,0,x,x} [id T] <TensorType(float64, (True, False, True, True))> ''
 | | | | | | | | | | | |W [id V] <TensorType(float64, 4D)>
 | | | | | | | | | | |InplaceDimShuffle{x,0,x,x} [id W] <TensorType(float64, (True, False, True, True))> ''
 | | | | | | | | | |   |b [id X] <TensorType(float64, vector)>
 | | | | | | | | | |ConvOp{('imshp', (32, 19, 19)),('kshp', (4, 4)),('nkern', 64),('bsize', None),('dx', 2),('dy', 2),('out_mode', 'valid'),('unroll_batch', None),('unroll_kern', None),('unroll_patch', True),('imshp_logical', (32, 19, 19)),('kshp_logical', (4, 4)),('kshp_logical_top_aligned', True)} [id N] <TensorType(float64, 4D)> ''
 | | | | | | | | | |InplaceDimShuffle{x,0,x,x} [id W] <TensorType(float64, (True, False, True, True))> ''
 | | | | | | | | |W [id Y] <TensorType(float64, 4D)>
 | | | | | | | |InplaceDimShuffle{x,0,x,x} [id Z] <TensorType(float64, (True, False, True, True))> ''
 | | | | | | |   |b [id BA] <TensorType(float64, vector)>
 | | | | | | |ConvOp{('imshp', (64, 8, 8)),('kshp', (3, 3)),('nkern', 64),('bsize', None),('dx', 1),('dy', 1),('out_mode', 'valid'),('unroll_batch', None),('unroll_kern', None),('unroll_patch', True),('imshp_logical', (64, 8, 8)),('kshp_logical', (3, 3)),('kshp_logical_top_aligned', True)} [id K] <TensorType(float64, 4D)> ''
 | | | | | | |InplaceDimShuffle{x,0,x,x} [id Z] <TensorType(float64, (True, False, True, True))> ''
 | | | | | |TensorConstant{[32 -1]} [id BB] <TensorType(int64, vector)>
 | | | | |W [id BC] <TensorType(float64, matrix)>
 | | | |InplaceDimShuffle{x,0} [id BD] <TensorType(float64, row)> ''
 | | |   |b [id BE] <TensorType(float64, vector)>
 | | |Dot22 [id F] <TensorType(float64, matrix)> ''
 | | |InplaceDimShuffle{x,0} [id BD] <TensorType(float64, row)> ''
 | |W [id BF] <TensorType(float64, matrix)>
 |b [id BG] <TensorType(float64, vector)>
 |targets [id BH] <TensorType(int32, vector)>
CrossentropySoftmaxArgmax1HotWithBias.1 [id A] <TensorType(float64, matrix)> ''
CrossentropySoftmaxArgmax1HotWithBias.2 [id A] <TensorType(int32, vector)> ''

存储地图足迹：

Storage map footprint:
- W, Shared Input, Shape: (2304L, 512L), ElemSize: 8 Byte(s), TotalSize: 9437184 Byte(s)
- TensorConstant{[[[[ 1.   ..      ]]]]}, Shape: (32L, 4L, 80L, 80L), ElemSize: 8 Byte(s), TotalSize: 6553600 Byte(s)
- <TensorType(float64, 4D)>, Shared Input, Shape: (32L, 4L, 80L, 80L), ElemSize: 8 Byte(s), TotalSize: 6553600 Byte(s)
- inputs, Input, Shape: (32L, 80L, 80L, 4L), ElemSize: 8 Byte(s), TotalSize: 6553600 Byte(s)
- Elemwise{add,no_inplace}.0, Shape: (32L, 32L, 19L, 19L), ElemSize: 8 Byte(s), TotalSize: 2957312 Byte(s)
- Elemwise{Composite{(i0 * (Abs(i1) + i2 + i3))}}[(0, 2)].0, Shape: (32L, 32L, 19L, 19L), ElemSize: 8 Byte(s), TotalSize: 2957312 Byte(s)
 - Elemwise{add,no_inplace}.0, Shape: (32L, 64L, 8L, 8L), ElemSize: 8 Byte(s), TotalSize: 1048576 Byte(s)
 - Elemwise{Composite{(i0 * (Abs(i1) + i2 + i3))}}[(0, 2)].0, Shape: (32L, 64L, 8L, 8L), ElemSize: 8 Byte(s), TotalSize: 1048576 Byte(s)
 - Reshape{2}.0, Shape: (32L, 2304L), ElemSize: 8 Byte(s), TotalSize: 589824 Byte(s)
 - Elemwise{add,no_inplace}.0, Shape: (32L, 64L, 6L, 6L), ElemSize: 8 Byte(s), TotalSize: 589824 Byte(s)
 - W, Shared Input, Shape: (64L, 64L, 3L, 3L), ElemSize: 8 Byte(s), TotalSize: 294912 Byte(s)
 - W, Shared Input, Shape: (64L, 32L, 4L, 4L), ElemSize: 8 Byte(s), TotalSize: 262144 Byte(s)
 - Elemwise{add,no_inplace}.0, Shape: (32L, 512L), ElemSize: 8 Byte(s), TotalSize: 131072 Byte(s)
 - Elemwise{Composite{(i0 * (Abs(i1) + i2 + i3))}}[(0, 2)].0, Shape: (32L, 512L), ElemSize: 8 Byte(s), TotalSize: 131072 Byte(s)
 - W, Shared Input, Shape: (32L, 4L, 8L, 8L), ElemSize: 8 Byte(s), TotalSize: 65536 Byte(s)
 - W, Shared Input, Shape: (512L, 5L), ElemSize: 8 Byte(s), TotalSize: 20480 Byte(s)
 - b, Shared Input, Shape: (512L,), ElemSize: 8 Byte(s), TotalSize: 4096 Byte(s)
 - Dot22.0, Shape: (32L, 5L), ElemSize: 8 Byte(s), TotalSize: 1280 Byte(s)
- b, Shared Input, Shape: (64L,), ElemSize: 8 Byte(s), TotalSize: 512 Byte(s)
 - b, Shared Input, Shape: (64L,), ElemSize: 8 Byte(s), TotalSize: 512 Byte(s)
 - b, Shared Input, Shape: (32L,), ElemSize: 8 Byte(s), TotalSize: 256 Byte(s)
 - TensorConstant{(32L,) of 0.03125}, Shape: (32L,), ElemSize: 8 Byte(s), TotalSize: 256 Byte(s)
 - targets, Input, Shape: (32L,), ElemSize: 4 Byte(s), TotalSize: 128 Byte(s)
 - b, Shared Input, Shape: (5L,), ElemSize: 8 Byte(s), TotalSize: 40 Byte(s)
 - TensorConstant{[32 64  6  6]}, Shape: (4L,), ElemSize: 8 Byte(s), TotalSize: 32 Byte(s)
 - TensorConstant{(2L,) of 19}, Shape: (2L,), ElemSize: 8 Byte(s), TotalSize: 16 Byte(s)
 - TensorConstant{[32 -1]}, Shape: (2L,), ElemSize: 8 Byte(s), TotalSize: 16 Byte(s)
 - TensorConstant{[4 4 1]}, Shape: (3L,), ElemSize: 4 Byte(s), TotalSize: 12 Byte(s)
 - TensorConstant{[2 2 1]}, Shape: (3L,), ElemSize: 4 Byte(s), TotalSize: 12 Byte(s)
 - TensorConstant{(1L,) of 1e-06}, Shape: (1L,), ElemSize: 8 Byte(s), TotalSize: 8 Byte(s)
 - TensorConstant{0.0}, Shape: (), ElemSize: 8 Byte(s), TotalSize: 8.0 Byte(s)
 - TensorConstant{0.03125}, Shape: (), ElemSize: 8 Byte(s), TotalSize: 8.0 Byte(s)
 - TensorConstant{(1L, 1L) of 0.5}, Shape: (1L, 1L), ElemSize: 8 Byte(s), TotalSize: 8 Byte(s)
 - TensorConstant{(1L, 1L, 1..1L) of 0.5}, Shape: (1L, 1L, 1L, 1L), ElemSize: 8 Byte(s), TotalSize: 8 Byte(s)
 - TensorConstant{(1L, 1L, 1..) of 1e-06}, Shape: (1L, 1L, 1L, 1L), ElemSize: 8 Byte(s), TotalSize: 8 Byte(s)
 - Constant{-1}, Shape: (), ElemSize: 8 Byte(s), TotalSize: 8.0 Byte(s)
 - Constant{4}, Shape: (), ElemSize: 8 Byte(s), TotalSize: 8.0 Byte(s)
 - TensorConstant{0.5}, Shape: (), ElemSize: 8 Byte(s), TotalSize: 8.0 Byte(s)
 - Constant{1}, Shape: (), ElemSize: 8 Byte(s), TotalSize: 8.0 Byte(s)
 - TensorConstant{-1e-06}, Shape: (), ElemSize: 8 Byte(s), TotalSize: 8.0 Byte(s)
 - Constant{0}, Shape: (), ElemSize: 8 Byte(s), TotalSize: 8.0 Byte(s)
 - TensorConstant{1.0}, Shape: (), ElemSize: 8 Byte(s), TotalSize: 8.0 Byte(s)
 - TensorConstant{1}, Shape: (), ElemSize: 1 Byte(s), TotalSize: 1.0 Byte(s)
 TotalSize: 39201897.0 Byte(s) 0.037 GB
 TotalSize inputs: 29747049.0 Byte(s) 0.028 GB

我的训练功能如下：

def _train(self):
    start_time = time.time()
    # Prepare Theano variables for inputs and targets
    input_variable = T.tensor4('inputs')
    states = T.tensor4('states')
    expected = T.tensor4('expected')
    real_rewards = T.tensor4('rewards')
    print "sampling mini batch..."
    # sample a mini_batch to train on
    mini_batch = random.sample(self._observations, self.MINI_BATCH_SIZE)
    # get the batch variables
    previous_states = [d[self.OBS_LAST_STATE_INDEX] for d in mini_batch]
    actions = [d[self.OBS_ACTION_INDEX] for d in mini_batch]
    rewards = [d[self.OBS_REWARD_INDEX] for d in mini_batch]
    current_states = np.array([d[self.OBS_CURRENT_STATE_INDEX] for d in mini_batch])
    agents_expected_reward = []
    print "compiling current states..."
    current_states = np.rollaxis(current_states, 3, 1)

    print "getting network output from current states..."
    agents_reward_per_action = lasagne.layers.get_output(self._output_layer, current_states)

    self._train_err = 0
    print "rewards adding..."
    for i in range(len(mini_batch)):
        if mini_batch[i][self.OBS_TERMINAL_INDEX]:
            agents_expected_reward.append(rewards[i])
        else:
            agents_expected_reward.append(
                rewards[i] + self.FUTURE_REWARD_DISCOUNT * np.max(agents_reward_per_action[i].eval()))

    network = self._output_layer
    prediction = agents_reward_per_action
    loss = lasagne.objectives.categorical_crossentropy(prediction, target_var)
    loss = loss.mean()

    params = lasagne.layers.get_all_params(network, trainable=True)
    updates = lasagne.updates.sgd(loss, params, self.LEARN_RATE)
    givens = {
        states: current_states,
        expected: agents_expected_reward,
        real_rewards: rewards
    }
    train_fn = theano.function([input_var, target_var], loss,
                                    updates=updates, on_unused_input='warn',
                                    givens=givens,
                                    allow_input_downcast='True')

    self._train_err += train_fn(previous_states, agents_expected_reward)

我不明白为什么它发生在5万次观测而不是500次。唯一改变的是观察量，为什么它会突然超出界限。有关为什么会发生这种情况的任何想法？每个答案都非常感谢。谢谢。

完整的代码在这里：

主文件：http://pastebin.com/y53VAauT
Snake游戏文件：http://pastebin.com/Xx4aPRu7
pygame播放器文件：http://pastebin.com/EKAYt5N8（获取游戏分数并发送动作）

编辑：它也发生在1000及以上。仍然不知道为什么。

编辑2：我发现问题与奖励有关。因为模型尚未训练，所以它主要是随机移动。这就是为什么在更多的观察结果中出现问题的原因。（奖励高于5的机会更高）并且当奖励（蛇的长度）高于5时它给出错误，这是奇数，因为5是可能的动作的数量（神经网络的输出）。更接近解决方案！

Theano错误：ValueError：y_i值超出范围。

0 个答案: