Question

我正在使用贝叶斯定理的一种变型，以基于自定义似然函数和用户输入正确不正确（1，0）来更新状态的所有概率。每个用户响应都是针对特定技能的问题（技能= a-z），该问题会创建2 ^ n = 2 ^ 26 = 67,108,864个可能的状态，必须对其进行更新。为了确定贝叶斯方程的基数，必须计算P（data）=基于数据的每个状态的似然度和先验概率之积。新概率只是我将其放在矢量符号中的每个可能性时间，早于该概率。此过程计算简单，但是每次都必须使用for循环或map函数分析每个状态，这会造成巨大的瓶颈。

这是我尝试过的：请记住len（states）= 67,108,864

def like(ans, d, match):
"""
calculates the liklihood function
:param ans: 1, 0 denoting correct or in correct response
:param d: quesiton difficulty
:param match: True if question space matches part of learning space
:return: liklihood value
"""
if ans == 1 and not match or ans == 0 and match:
    return 1
else:
    # if ans == 0 and not match or ans == 1 and match
    return 1/(1-d)

def p_s_theta(ans, d, space):
"""
Calculates the data prob and return posterior prob of all spaces
:param ans:  1, 0 denoting correct or in correct response
:param d: quesiton difficulty
:param space: learning space of question asked(a-z)
:return: posterior probability of all learning spaces
"""
def p_prime(s):
    if space in s[0]:
        return like(ans, d, True) * s[1]
    else:
        return like(ans, d, False) * s[1]
p_theta = list(map(p_prime, states))

p_t = sum(p_theta)
return np.asarray(p_theta)/p_t

以前，map函数以for循环形式表现的效率较低，这要慢得多：

def p_s_theta(ans, d, space):
"""
Calculates the data prob and return posterior prob of all spaces
:param ans:  1, 0 denoting correct or in correct response
:param d: quesiton difficulty
:param space: learning space of question asked
:return: posterior probabiltity of all learning spaces
"""
p_theta = []
for s in states:
    if space in s[0]:
        p_theta.append(like(ans, d, True) * s[1])
    else:
        p_theta.append(like(ans, d, False) * s[1])
p_theta = list(map(p_prime, states))
p_t = sum(p_theta)
return np.asarray(p_theta)/p_t

我也尝试过列表理解和多处理，但是没有运气。我的问题是，是否有人知道这样一种通过任意计算比for循环或map更快地遍历数百万个状态的方法？另外，如果有人知道并行处理方法可以处理其余程序开销，那是我认为可以解决的一种解决方案。

for循环超过2 ^ n个项目瓶颈

0 个答案: