使用pymc的Python中的贝叶斯IRT模型

时间:2015-03-13 20:33:21

标签: python bayesian pymc

我想用Python估计一个项目反应理论(IRT)模型。更具体地说,参考学生参加考试的典型IRT示例。对于每个学生,我们观察他们是否对他们在考试中回答的问题给出了正确的答案。这给了我们一个观察结果矩阵X,从中我们想要估计每个问题(1)难度参数α和(2)辨别参数β,这样我们也可以估计每个学生潜在能力Y作为他们是否的函数在每个测试问题上得到正确的答案,即α+βX。我可以找到如何在Python中使用MCMC估计这种类型的IRT贝叶斯模型的最佳示例是example。从这个例子中我不明白的是,学生是否在测试问题上得到正确答案的X矩阵进入模型。以下是此代码的略微修改版本,旨在评估每个学生的潜在能力:

#from pylab import * #Pylab will not install with pip so I just loaded numpy itself
from numpy import *
import numpy
from pymc import *
from pymc.Matplot import plot as mplot

numquestions = 300 # number of test items being simulated
numpeople = 10 # number of participants
numthetas = 1 # number of latent proficiency variables

generating = 0
theta_initial = zeros((numthetas, numpeople))
correctness = np.random.randint(2, size= numquestions * numpeople) == 1 #Produces Error
#correctness = np.random.randint(2, size= numquestions * numpeople) == -1 #all False code runs fine
#correctness = np.random.randint(2, size= numquestions * numpeople) != -1 #all True code throws error message

correctness.shape = (numquestions, numpeople)


# theta (proficiency params) are sampled from a normal distribution
theta = Normal("theta", mu=0, tau=1, value=theta_initial, observed= generating)


# question-parameters (IRT params) are sampled from normal distributions (though others were tried)
a = Normal("a", mu=1, tau=1, value=[[0.0] * numthetas] * numquestions)
# a = Exponential("a", beta=0.01, value=[[0.0] * numthetas] * numquestions)
b = Normal("b", mu=0, tau=1, value=[0.0] * numquestions)

# take vectors theta/a/b, return a vector of probabilities of each person getting each question correct
@deterministic
def sigmoid(theta=theta, a=a, b=b): 
    bs = repeat(reshape(b, (len(b), 1)), numpeople, 1)
    return np.zeros_like(1.0 / (1.0 + exp(bs - dot(a, theta)))) #np.zeros_like fixes error

# take the probabilities coming out of the sigmoid, and flip weighted coins
correct = Bernoulli('correct', p=sigmoid, value=correctness, observed=not generating)

# create a pymc simulation object, including all the above variables
m = MCMC([a,b,theta,sigmoid,correct])

# run an interactive MCMC sampling session
m.isample(iter=20000, burn=15000)


mydict = m.stats()
print(mydict['theta']['mean']) #Get ability parameters for each student

当我运行脚本时,我收到错误消息:

pymc.Node.ZeroProbability: Stochastic correct's value is outside its support,
 or it forbids its parents' current values.`

追溯到这一行:

correct = Bernoulli('correct', p=sigmoid, value=correctness, observed=not generating)

我检查了原始脚本(在从潜在值生成结果和从结果中计算潜在值之间切换)和correctness变量,我认为它是上述测试结果的X矩阵,充满了False值。当我将correctness设置为满False个值时,脚本就会完成。然而,这似乎表明每个学生都错了每一个问题,这不会有多大意义。我认为这可能是问题的正确答案,因此我将correctness中的所有值设置为True,但这会产生相同的错误。我做错了什么?如何使用IRT模型从X矩阵估计潜在能力是否学生使用pymc在测试问题上得到了正确的答案?

1 个答案:

答案 0 :(得分:6)

你已经被Python的一个偷偷摸摸的部分所困扰。 pymc的全局导入将numpy exp替换为其他exp。要获得所需的exp,您可以在np.exp确定性中使用sigmoid。 (np.来自哪里,我想知道?)

return np.exp(1.0 / (1.0 + np.exp(bs - dot(a, theta))))

看起来你还有一些调试要做,但我希望这会让你失意。这是我赞成这种模式的一个很好的例子:

import numpy as np, pymc as pm