我试图让我自己的数据与theano / lasagne中的卷积神经网络一起工作。
状态由4个80x80图像组成。批处理中有32个状态,批处理是神经网络的输入。网络输出有5个单位。 (与游戏中的5种可能行为有关)也见编辑2
在训练模型之前,程序会观察状态。在测试训练功能时,我将观察结果设置为仅500,一切都很好。但是当我将观察结果设置为50 000时,我突然得到了这个错误:
Traceback (most recent call last):
File "snake_player.py", line 320, in <module>
import snake
File "C:\A Bright Future\machine-learning\snake\PyGamePlayer-master\examples\snake.py", line 124, in <module>
pygame.display.update()
File "C:\A Bright Future\machine-learning\snake\PyGamePlayer-master\examples\pygame_player.py", line 28, in wrap
intercepted_results = intercepting_func(real_results, *args, **kwargs) # call our own function a
File "C:\A Bright Future\machine-learning\snake\PyGamePlayer-master\examples\pygame_player.py", line 149, in _on_screen_update
keys = self.get_keys_pressed(surface_array, reward, terminal)
File "snake_player.py", line 126, in get_keys_pressed
self._train()
File "snake_player.py", line 237, in _train
self._train_err += train_fn(previous_states, agents_expected_reward)
File "C:\Anaconda2\lib\site-packages\theano\compile\function_module.py", line 871, in __call__
storage_map=getattr(self.fn, 'storage_map', None))
File "C:\Anaconda2\lib\site-packages\theano\gof\link.py", line 314, in raise_with_op
reraise(exc_type, exc_value, exc_trace)
File "C:\Anaconda2\lib\site-packages\theano\compile\function_module.py", line 859, in __call__
outputs = self.fn()
ValueError: y_i value out of bounds
Apply node that caused the error: CrossentropySoftmaxArgmax1HotWithBias(Dot22.0, b, targets)
Toposort index: 50
Inputs types: [TensorType(float64, matrix), TensorType(float64, vector), TensorType(int32, vector)]
Inputs shapes: [(32L, 5L), (5L,), (32L,)]
Inputs strides: [(40L, 8L), (8L,), (4L,)]
Inputs values: ['not shown', array([ 0.1, 0.1, 0.1, 0.1, 0.1]), 'not shown']
Outputs clients: [[Sum{acc_dtype=float64}(CrossentropySoftmaxArgmax1HotWithBias.0)], [CrossentropySoftmax1HotWithBiasDx(TensorConstant{(32L,) of 0.03125}, CrossentropySoftmaxArgmax1HotWithBias.1, targets)], []]
apply节点的Debugprint:
CrossentropySoftmaxArgmax1HotWithBias.0 [id A] <TensorType(float64, vector)> ''
|Dot22 [id B] <TensorType(float64, matrix)> ''
| |Elemwise{Composite{(i0 * (Abs(i1) + i2 + i3))}}[(0, 2)] [id C] <TensorType(float64, matrix)> ''
| | |TensorConstant{(1L, 1L) of 0.5} [id D] <TensorType(float64, (True, True))>
| | |Elemwise{add,no_inplace} [id E] <TensorType(float64, matrix)> ''
| | | |Dot22 [id F] <TensorType(float64, matrix)> ''
| | | | |Reshape{2} [id G] <TensorType(float64, matrix)> ''
| | | | | |Elemwise{Composite{(i0 * (Abs(i1) + i2 + i3))}}[(0, 2)] [id H] <TensorType(float64, 4D)> ''
| | | | | | |TensorConstant{(1L, 1L, 1..1L) of 0.5} [id I] <TensorType(float64, (True, True, True, True))>
| | | | | | |Elemwise{add,no_inplace} [id J] <TensorType(float64, 4D)> ''
| | | | | | | |ConvOp{('imshp', (64, 8, 8)),('kshp', (3, 3)),('nkern', 64),('bsize', None),('dx', 1),('dy', 1),('out_mode', 'valid'),('unroll_batch', None),('unroll_kern', None),('unroll_patch', True),('imshp_logical', (64, 8, 8)),('kshp_logical', (3, 3)),('kshp_logical_top_aligned', True)} [id K] <TensorType(float64, 4D)> ''
| | | | | | | | |Elemwise{Composite{(i0 * (Abs(i1) + i2 + i3))}}[(0, 2)] [id L] <TensorType(float64, 4D)> ''
| | | | | | | | | |TensorConstant{(1L, 1L, 1..1L) of 0.5} [id I] <TensorType(float64, (True, True, True, True))>
| | | | | | | | | |Elemwise{add,no_inplace} [id M] <TensorType(float64, 4D)> ''
| | | | | | | | | | |ConvOp{('imshp', (32, 19, 19)),('kshp', (4, 4)),('nkern', 64),('bsize', None),('dx', 2),('dy', 2),('out_mode', 'valid'),('unroll_batch', None),('unroll_kern', None),('unroll_patch', True),('imshp_logical', (32, 19, 19)),('kshp_logical', (4, 4)),('kshp_logical_top_aligned', True)} [id N] <TensorType(float64, 4D)> ''
| | | | | | | | | | | |Elemwise{Composite{(i0 * (Abs(i1) + i2 + i3))}}[(0, 2)] [id O] <TensorType(float64, 4D)> ''
| | | | | | | | | | | | |TensorConstant{(1L, 1L, 1..1L) of 0.5} [id I] <TensorType(float64, (True, True, True, True))>
| | | | | | | | | | | | |Elemwise{add,no_inplace} [id P] <TensorType(float64, 4D)> ''
| | | | | | | | | | | | | |ConvOp{('imshp', (4, 80, 80)),('kshp', (8, 8)),('nkern', 32),('bsize', None),('dx', 4),('dy', 4),('out_mode', 'valid'),('unroll_batch', None),('unroll_kern', None),('unroll_patch', True),('imshp_logical', (4, 80, 80)),('kshp_logical', (8, 8)),('kshp_logical_top_aligned', True)} [id Q] <TensorType(float64, 4D)> '
| | | | | | | | | | | | | | |TensorConstant{[[[[ 1. .. ]]]]} [id R] <TensorType(float64, 4D)>
| | | | | | | | | | | | | | |W [id S] <TensorType(float64, 4D)>
| | | | | | | | | | | | | |InplaceDimShuffle{x,0,x,x} [id T] <TensorType(float64, (True, False, True, True))> ''
| | | | | | | | | | | | | |b [id U] <TensorType(float64, vector)>
| | | | | | | | | | | | |ConvOp{('imshp', (4, 80, 80)),('kshp', (8, 8)),('nkern', 32),('bsize', None),('dx', 4),('dy', 4),('out_mode', 'valid'),('unroll_batch', None),('unroll_kern', None),('unroll_patch', True),('imshp_logical', (4, 80, 80)),('kshp_logical', (8, 8)),('kshp_logical_top_aligned', True)} [id Q] <TensorType(float64, 4D)> ''
| | | | | | | | | | | | |InplaceDimShuffle{x,0,x,x} [id T] <TensorType(float64, (True, False, True, True))> ''
| | | | | | | | | | | |W [id V] <TensorType(float64, 4D)>
| | | | | | | | | | |InplaceDimShuffle{x,0,x,x} [id W] <TensorType(float64, (True, False, True, True))> ''
| | | | | | | | | | |b [id X] <TensorType(float64, vector)>
| | | | | | | | | |ConvOp{('imshp', (32, 19, 19)),('kshp', (4, 4)),('nkern', 64),('bsize', None),('dx', 2),('dy', 2),('out_mode', 'valid'),('unroll_batch', None),('unroll_kern', None),('unroll_patch', True),('imshp_logical', (32, 19, 19)),('kshp_logical', (4, 4)),('kshp_logical_top_aligned', True)} [id N] <TensorType(float64, 4D)> ''
| | | | | | | | | |InplaceDimShuffle{x,0,x,x} [id W] <TensorType(float64, (True, False, True, True))> ''
| | | | | | | | |W [id Y] <TensorType(float64, 4D)>
| | | | | | | |InplaceDimShuffle{x,0,x,x} [id Z] <TensorType(float64, (True, False, True, True))> ''
| | | | | | | |b [id BA] <TensorType(float64, vector)>
| | | | | | |ConvOp{('imshp', (64, 8, 8)),('kshp', (3, 3)),('nkern', 64),('bsize', None),('dx', 1),('dy', 1),('out_mode', 'valid'),('unroll_batch', None),('unroll_kern', None),('unroll_patch', True),('imshp_logical', (64, 8, 8)),('kshp_logical', (3, 3)),('kshp_logical_top_aligned', True)} [id K] <TensorType(float64, 4D)> ''
| | | | | | |InplaceDimShuffle{x,0,x,x} [id Z] <TensorType(float64, (True, False, True, True))> ''
| | | | | |TensorConstant{[32 -1]} [id BB] <TensorType(int64, vector)>
| | | | |W [id BC] <TensorType(float64, matrix)>
| | | |InplaceDimShuffle{x,0} [id BD] <TensorType(float64, row)> ''
| | | |b [id BE] <TensorType(float64, vector)>
| | |Dot22 [id F] <TensorType(float64, matrix)> ''
| | |InplaceDimShuffle{x,0} [id BD] <TensorType(float64, row)> ''
| |W [id BF] <TensorType(float64, matrix)>
|b [id BG] <TensorType(float64, vector)>
|targets [id BH] <TensorType(int32, vector)>
CrossentropySoftmaxArgmax1HotWithBias.1 [id A] <TensorType(float64, matrix)> ''
CrossentropySoftmaxArgmax1HotWithBias.2 [id A] <TensorType(int32, vector)> ''
存储地图足迹:
Storage map footprint:
- W, Shared Input, Shape: (2304L, 512L), ElemSize: 8 Byte(s), TotalSize: 9437184 Byte(s)
- TensorConstant{[[[[ 1. .. ]]]]}, Shape: (32L, 4L, 80L, 80L), ElemSize: 8 Byte(s), TotalSize: 6553600 Byte(s)
- <TensorType(float64, 4D)>, Shared Input, Shape: (32L, 4L, 80L, 80L), ElemSize: 8 Byte(s), TotalSize: 6553600 Byte(s)
- inputs, Input, Shape: (32L, 80L, 80L, 4L), ElemSize: 8 Byte(s), TotalSize: 6553600 Byte(s)
- Elemwise{add,no_inplace}.0, Shape: (32L, 32L, 19L, 19L), ElemSize: 8 Byte(s), TotalSize: 2957312 Byte(s)
- Elemwise{Composite{(i0 * (Abs(i1) + i2 + i3))}}[(0, 2)].0, Shape: (32L, 32L, 19L, 19L), ElemSize: 8 Byte(s), TotalSize: 2957312 Byte(s)
- Elemwise{add,no_inplace}.0, Shape: (32L, 64L, 8L, 8L), ElemSize: 8 Byte(s), TotalSize: 1048576 Byte(s)
- Elemwise{Composite{(i0 * (Abs(i1) + i2 + i3))}}[(0, 2)].0, Shape: (32L, 64L, 8L, 8L), ElemSize: 8 Byte(s), TotalSize: 1048576 Byte(s)
- Reshape{2}.0, Shape: (32L, 2304L), ElemSize: 8 Byte(s), TotalSize: 589824 Byte(s)
- Elemwise{add,no_inplace}.0, Shape: (32L, 64L, 6L, 6L), ElemSize: 8 Byte(s), TotalSize: 589824 Byte(s)
- W, Shared Input, Shape: (64L, 64L, 3L, 3L), ElemSize: 8 Byte(s), TotalSize: 294912 Byte(s)
- W, Shared Input, Shape: (64L, 32L, 4L, 4L), ElemSize: 8 Byte(s), TotalSize: 262144 Byte(s)
- Elemwise{add,no_inplace}.0, Shape: (32L, 512L), ElemSize: 8 Byte(s), TotalSize: 131072 Byte(s)
- Elemwise{Composite{(i0 * (Abs(i1) + i2 + i3))}}[(0, 2)].0, Shape: (32L, 512L), ElemSize: 8 Byte(s), TotalSize: 131072 Byte(s)
- W, Shared Input, Shape: (32L, 4L, 8L, 8L), ElemSize: 8 Byte(s), TotalSize: 65536 Byte(s)
- W, Shared Input, Shape: (512L, 5L), ElemSize: 8 Byte(s), TotalSize: 20480 Byte(s)
- b, Shared Input, Shape: (512L,), ElemSize: 8 Byte(s), TotalSize: 4096 Byte(s)
- Dot22.0, Shape: (32L, 5L), ElemSize: 8 Byte(s), TotalSize: 1280 Byte(s)
- b, Shared Input, Shape: (64L,), ElemSize: 8 Byte(s), TotalSize: 512 Byte(s)
- b, Shared Input, Shape: (64L,), ElemSize: 8 Byte(s), TotalSize: 512 Byte(s)
- b, Shared Input, Shape: (32L,), ElemSize: 8 Byte(s), TotalSize: 256 Byte(s)
- TensorConstant{(32L,) of 0.03125}, Shape: (32L,), ElemSize: 8 Byte(s), TotalSize: 256 Byte(s)
- targets, Input, Shape: (32L,), ElemSize: 4 Byte(s), TotalSize: 128 Byte(s)
- b, Shared Input, Shape: (5L,), ElemSize: 8 Byte(s), TotalSize: 40 Byte(s)
- TensorConstant{[32 64 6 6]}, Shape: (4L,), ElemSize: 8 Byte(s), TotalSize: 32 Byte(s)
- TensorConstant{(2L,) of 19}, Shape: (2L,), ElemSize: 8 Byte(s), TotalSize: 16 Byte(s)
- TensorConstant{[32 -1]}, Shape: (2L,), ElemSize: 8 Byte(s), TotalSize: 16 Byte(s)
- TensorConstant{[4 4 1]}, Shape: (3L,), ElemSize: 4 Byte(s), TotalSize: 12 Byte(s)
- TensorConstant{[2 2 1]}, Shape: (3L,), ElemSize: 4 Byte(s), TotalSize: 12 Byte(s)
- TensorConstant{(1L,) of 1e-06}, Shape: (1L,), ElemSize: 8 Byte(s), TotalSize: 8 Byte(s)
- TensorConstant{0.0}, Shape: (), ElemSize: 8 Byte(s), TotalSize: 8.0 Byte(s)
- TensorConstant{0.03125}, Shape: (), ElemSize: 8 Byte(s), TotalSize: 8.0 Byte(s)
- TensorConstant{(1L, 1L) of 0.5}, Shape: (1L, 1L), ElemSize: 8 Byte(s), TotalSize: 8 Byte(s)
- TensorConstant{(1L, 1L, 1..1L) of 0.5}, Shape: (1L, 1L, 1L, 1L), ElemSize: 8 Byte(s), TotalSize: 8 Byte(s)
- TensorConstant{(1L, 1L, 1..) of 1e-06}, Shape: (1L, 1L, 1L, 1L), ElemSize: 8 Byte(s), TotalSize: 8 Byte(s)
- Constant{-1}, Shape: (), ElemSize: 8 Byte(s), TotalSize: 8.0 Byte(s)
- Constant{4}, Shape: (), ElemSize: 8 Byte(s), TotalSize: 8.0 Byte(s)
- TensorConstant{0.5}, Shape: (), ElemSize: 8 Byte(s), TotalSize: 8.0 Byte(s)
- Constant{1}, Shape: (), ElemSize: 8 Byte(s), TotalSize: 8.0 Byte(s)
- TensorConstant{-1e-06}, Shape: (), ElemSize: 8 Byte(s), TotalSize: 8.0 Byte(s)
- Constant{0}, Shape: (), ElemSize: 8 Byte(s), TotalSize: 8.0 Byte(s)
- TensorConstant{1.0}, Shape: (), ElemSize: 8 Byte(s), TotalSize: 8.0 Byte(s)
- TensorConstant{1}, Shape: (), ElemSize: 1 Byte(s), TotalSize: 1.0 Byte(s)
TotalSize: 39201897.0 Byte(s) 0.037 GB
TotalSize inputs: 29747049.0 Byte(s) 0.028 GB
我的训练功能如下:
def _train(self):
start_time = time.time()
# Prepare Theano variables for inputs and targets
input_variable = T.tensor4('inputs')
states = T.tensor4('states')
expected = T.tensor4('expected')
real_rewards = T.tensor4('rewards')
print "sampling mini batch..."
# sample a mini_batch to train on
mini_batch = random.sample(self._observations, self.MINI_BATCH_SIZE)
# get the batch variables
previous_states = [d[self.OBS_LAST_STATE_INDEX] for d in mini_batch]
actions = [d[self.OBS_ACTION_INDEX] for d in mini_batch]
rewards = [d[self.OBS_REWARD_INDEX] for d in mini_batch]
current_states = np.array([d[self.OBS_CURRENT_STATE_INDEX] for d in mini_batch])
agents_expected_reward = []
print "compiling current states..."
current_states = np.rollaxis(current_states, 3, 1)
print "getting network output from current states..."
agents_reward_per_action = lasagne.layers.get_output(self._output_layer, current_states)
self._train_err = 0
print "rewards adding..."
for i in range(len(mini_batch)):
if mini_batch[i][self.OBS_TERMINAL_INDEX]:
agents_expected_reward.append(rewards[i])
else:
agents_expected_reward.append(
rewards[i] + self.FUTURE_REWARD_DISCOUNT * np.max(agents_reward_per_action[i].eval()))
network = self._output_layer
prediction = agents_reward_per_action
loss = lasagne.objectives.categorical_crossentropy(prediction, target_var)
loss = loss.mean()
params = lasagne.layers.get_all_params(network, trainable=True)
updates = lasagne.updates.sgd(loss, params, self.LEARN_RATE)
givens = {
states: current_states,
expected: agents_expected_reward,
real_rewards: rewards
}
train_fn = theano.function([input_var, target_var], loss,
updates=updates, on_unused_input='warn',
givens=givens,
allow_input_downcast='True')
self._train_err += train_fn(previous_states, agents_expected_reward)
我不明白为什么它发生在5万次观测而不是500次。 唯一改变的是观察量,为什么它会突然超出界限。有关为什么会发生这种情况的任何想法?每个答案都非常感谢。谢谢。
完整的代码在这里:
编辑:它也发生在1000及以上。仍然不知道为什么。
编辑2:我发现问题与奖励有关。因为模型尚未训练,所以它主要是随机移动。这就是为什么在更多的观察结果中出现问题的原因。 (奖励高于5的机会更高)并且当奖励(蛇的长度)高于5时它给出错误,这是奇数,因为5是可能的动作的数量(神经网络的输出)。更接近解决方案!