我想使用DQN代理,其中我有多个连续状态(或观察值)和两个动作信号,每个信号都有三个可能的值,总共9种组合。例如,请参阅以下几行以了解我的意思:
a = [-2,0,2];
b = [-3,0,3];
[A,B] = meshgrid(a,b);
actions = reshape(cat(2,A',B'),[],2);
如果要创建离散操作,则需要将矩阵转换为单元格并运行命令:
actionInfo = rlFiniteSetSpec(num2cell(actions,2));
actionInfo.Name = 'actions';
此外,在DQN中,您还有一个批评者,其中包括一个深度神经网络。我创建了批评者,如下所示:
% Create a DNN for the critic:
hiddenLayerSize = 48;
observationPath = [
imageInputLayer([numObs 1 1],'Normalization','none',...
'Name','observation')
fullyConnectedLayer(hiddenLayerSize,'Name','CriticStateFC1')
reluLayer('Name','CriticReLu1')
fullyConnectedLayer(hiddenLayerSize,'Name','CriticStateFC2')
additionLayer(2,'Name','add')
reluLayer('Name','CriticCommonReLu1')
fullyConnectedLayer(hiddenLayerSize,'Name','CriticCommonFC1')
reluLayer('Name','CriticCommonReLu2')
fullyConnectedLayer(1,'Name','CriticOutput')];
actionPath = [
imageInputLayer([value 1 1],'Normalization','none','Name','action')
fullyConnectedLayer(hiddenLayerSize,'Name','CriticActionFC1')];
% Create the layerGraph:
criticNetwork = layerGraph(observationPath);
criticNetwork = addLayers(criticNetwork,actionPath);
% Connect actionPath to obervationPath:
criticNetwork = connectLayers(criticNetwork,'CriticActionFC1','add/in2');
% Specify options for the critic representation:
criticOpts = rlRepresentationOptions('LearnRate',1e-03,...
'GradientThreshold',1,'UseDevice','gpu');
% Create the critic representation using the specified DNN and options:
critic = rlRepresentation(criticNetwork,observationInfo,actionInfo,...
'Observation',{'observation'},'Action',{'action'},criticOpts);
% Set the desired options for the agent:
agentOptions = rlDQNAgentOptions(...
'SampleTime',dt,...
'UseDoubleDQN',true,...
'TargetSmoothFactor',1e-3,...
'DiscountFactor',0.99,...
'ExperienceBufferLength',1e7,...
'MiniBatchSize',128);
我的问题是动作路径imageInputLayer([value 1 1],'Normalization','none','Name','action')
的第一个图像输入层。我为value
尝试了1、2、9和18的值,但是当我运行时,所有的结果都会导致错误
agent = rlDQNAgent(critic,agentOptions);
这是因为actionInfo
包含9个元素的单元格,每个元素的尺寸为[1,2]
的双矢量,而imageInputLayer
的尺寸为[value,1,1]
。
那么,如何在MATLAB中用两个主要的离散动作信号(每个信号具有三个可能的值)来设置DQN代理?该代理可与Simulink环境一起使用。因此,我不确定Simulink加固模块对两个输出有何反应。
我是否需要返回单个索引向量,并使用单独的函数将它们映射到正确的矩阵?
在此先感谢您的帮助!