我要在策略的轨迹上指定一个具有自定义对数概率函数的PyMC模型-要计算此函数的值,我需要首先将采样策略的表示形式从权重数组中转移出来基于状态和操作的表格策略的回归模型。这涉及获取每个可能的状态,一个numpy数组,以及使用Theano中实现的逻辑回归模型计算操作的分布。但是,我不确定如何实现此功能,以便在需要时设置正确的计算图并传递值。
下面是我当前试图执行此功能的(不工作)功能,以及针对给定策略计算轨迹的对数概率的功能。我还尝试了pol2polsArray
函数通过使用inc_subtensor
方法返回Theano张量的方法,尽管鉴于所有状态实际上都是常量,但我还没有弄清楚如何正确实现该方法(自从我正在计算每个状态的值。
def pol2PolsArray(self, pols):
# Takes A (called 'pols') for joint policies and maps to joint pols array ('polsArray')
# Input: (num_obs, 2, 2*dim, dim)-shaped theano variable
# Output: (num_obs, 2, length, ..., length, 2*dim)-shaped numpy array
theano.config.compute_test_value = 'warn'
pols.tag.test_value = np.zeros((self.obs.num_obs, 2, 2*self.obs.environment.dim, self.obs.environment.dim))
# Initialise theano variables
state = tt.dvector('state')
# state.tag.test_value = self.obs.environment.dim * (0,)
ind0 = tt.bscalar('ind0')
# state.tag.test_value = 0
ind1 = tt.bscalar('ind1')
# state.tag.test_value = 0
# Create graph for computing softmax
state_act_distn = tt.exp(tt.dot(pols[ind0, ind1], state)) / tt.sum(tt.exp(tt.dot(pols[ind0, ind1], state)))
pol_softmax = theano.function(inputs=[state, ind0, ind1], outputs=state_act_distn)
# Initialise polArray
# Possibly boilerplate as shape is expressed elsewhere
polsArray = np.zeros((self.obs.num_obs, 2) + self.obs.environment.dim*(self.obs.environment.length,) + (2*self.obs.environment.dim,))
# Compute softmax for each state value, input into polsArray
for np_ind0 in range(self.obs.num_obs):
for np_ind1 in range(2):
for np_state in product(range(self.obs.environment.length), repeat=self.obs.environment.dim):
polsArray[(np_ind0, np_ind1) + np_state] = pol_softmax(np_state, np_ind0, np_ind1)
return polsArray
def logp_traj(self, trajectories, pols):
polsArray = self.pol2PolsArray(pols)
return (trajectories * np.log(polsArray)).sum()
这会产生以下错误,该错误似乎是最初编译图形时产生的:
theano.gof.fg.MissingInputError: Input 0 of the graph (indices start from 0), used to compute Subtensor{int8, int8}(pols, ScalarFromTensor.0, ScalarFromTensor.0), was not provided and not given a value. Use the Theano flag exception_verbosity='high', for more information on this error.
如何重写pol2polsArray
方法,以便在为pols
变量赋值之前正确地构建图形?