我正在尝试编写一个程序来玩类似于Atari游戏的游戏。详细信息:
在此游戏中,物体以不同的角度和方向掉落,而探员的目标是使用湍流拦截物体。每个状态都由环境在每个操作之后给出的反馈组成。反馈包括湍流角,物体的位置,拦截的位置和总分。代理可以选择四种可能的操作。
我决定尝试使用Q深学习,损耗和梯度下降计算来实施该程序。我在python 3.6.7中使用了tensorflow
我遇到了这个问题:
C:\Users\elinor\AppData\Local\Programs\Python\Python36\python.exe C:/Users/elinor/Desktop/Intrcpt/ddqn.py
2019-10-15 13:14:15.946083: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d (Conv2D) (None, 30, 30, 32) 896
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 15, 15, 32) 0
_________________________________________________________________
conv2d_1 (Conv2D) (None, 13, 13, 64) 18496
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 6, 6, 64) 0
_________________________________________________________________
conv2d_2 (Conv2D) (None, 4, 4, 64) 36928
_________________________________________________________________
flatten (Flatten) (None, 1024) 0
_________________________________________________________________
dense (Dense) (None, 64) 65600
_________________________________________________________________
dense_1 (Dense) (None, 10) 650
=================================================================
Total params: 122,570
Trainable params: 122,570
Non-trainable params: 0
_________________________________________________________________
Model: "sequential_1"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d_3 (Conv2D) (None, 30, 30, 32) 896
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 15, 15, 32) 0
_________________________________________________________________
conv2d_4 (Conv2D) (None, 13, 13, 64) 18496
_________________________________________________________________
max_pooling2d_3 (MaxPooling2 (None, 6, 6, 64) 0
_________________________________________________________________
conv2d_5 (Conv2D) (None, 4, 4, 64) 36928
_________________________________________________________________
flatten_1 (Flatten) (None, 1024) 0
_________________________________________________________________
dense_2 (Dense) (None, 64) 65600
_________________________________________________________________
dense_3 (Dense) (None, 10) 650
=================================================================
Total params: 122,570
Trainable params: 122,570
Non-trainable params: 0
_________________________________________________________________
Traceback (most recent call last):
File "C:/Users/elinor/Desktop/Intrcpt/ddqn.py", line 162, in <module>
exec_process()
File "C:/Users/elinor/Desktop/Intrcpt/ddqn.py", line 82, in exec_process
history = network.fit(state, epochs=10, batch_size=10)
File "C:\Users\elinor\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow_core\python\keras\engine\training.py", line 728, in fit
use_multiprocessing=use_multiprocessing)
File "C:\Users\elinor\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow_core\python\keras\engine\training_v2.py", line 224, in fit
distribution_strategy=strategy)
File "C:\Users\elinor\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow_core\python\keras\engine\training_v2.py", line 547, in _process_training_inputs
use_multiprocessing=use_multiprocessing)
File "C:\Users\elinor\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow_core\python\keras\engine\training_v2.py", line 594, in _process_inputs
steps=steps)
File "C:\Users\elinor\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow_core\python\keras\engine\training.py", line 2472, in _standardize_user_data
exception_prefix='input')
File "C:\Users\elinor\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow_core\python\keras\engine\training_utils.py", line 574, in standardize_input_data
str(data_shape))
ValueError: Error when checking input: expected conv2d_input to have shape (32, 32, 3) but got array with shape (1, 1, 1)
Process finished with exit code 1
,我不明白该如何解决。我知道问题出在维度上,但我不知道如何正确编写它,尽管我尝试了许多在线发现的语法结构。也许有人知道我该怎么解决?
这是有问题的部分:
`state = tf.reshape(state, shape=(1,1,1,1))`
history = network.fit(state, epochs=10, batch_size=10)
还有:
state = np.ndarray(shape=(1, 1, 1, 1), dtype=float)
完整代码如下:link
另外,我不确定我是否应该将参数传递给培训。我已经使用了一些指南来编写该程序,但是我不能将头放在一个主要问题上-在哪个阶段将状态和奖励等参数传递给网络?也许有人对此有见识,可以帮助我了解我是否做错了事?
非常感谢您的时间和精力!