pymc:Dirichlet,其浓度因子取决于输入变量

时间:2018-09-20 13:26:05

标签: python bayesian pymc multinomial dirichlet

我正在努力建立一个模型,其中Dirichlet变量的集中因子取决于另一个变量。

情况如下:

由于组件故障而导致系统故障(共有三个组件,每次测试/观察仅一个故障)。

组件故障的可能性取决于温度。

以下是对此情况的简短评论:

import numpy as np
import pymc as pm


# Temperature data : 3 cold temperatures and 3 warm temperatures
T_data = np.array([10, 12, 14, 80, 90, 95])

# Data of failures of 3 components : [0,0,1] means component 3 failed
F_data = np.array([[0, 0, 1], \
       [0, 0, 1], \
       [0, 0, 1], \
       [1, 0, 0], \
       [1, 0, 0], \
       [1, 0, 0]])

n_component = 3

# When temperature is cold : Component 1 fails
# When temperature is warm : Component 3 fails
# Component 2 never fails

# Number of observations :
n_obs = len(F_data)


# The number of failures can be modeled as a Multinomial F ~ M(n_obs, p) with parameters 
# -  n_test : number of tests (Fixed)
# -  p : probability of failure of each component (shape (n_obs, 3))

# The probability of failure of components follows a Dirichlet distribution p ~ Dir(alpha) with parameters:
# -  alpha : concentration (shape (n_obs, 3))
# The Dirichlet distributions ensures the probabilities sum to 1 

# The alpha parameters (and the the probability of failures) depend on the temperature alpha ~ a + b * T
# - a : bias term (shape (1,3))
# - b : describes temperature dependency of alpha (shape (1,3))

# The prior on "a" is a normal distributions with mean 1/2 and std 0.001
# a ~ N(1/2, 0.001)

# The prior on "b" is a normal distribution zith mean 0 and std 0.001
# b ~ N(0, 0.001)


# Coding it all with pymc
a = pm.Normal('a', 1/2, 1/(0.001**2), size = n_component)
b = pm.Normal('b', 0, 1/(0.001**2), size = n_component)


# I generate 3 alphas values (corresponding to the 3 components) for each of the 6 temperatures
@pm.deterministic
def alphas(t=T_data, a=a, b=b):
    return a + np.outer(t, b)

# I think I should get 3 probabilities (corresponding to the 3 components) for each of the 6 temperatures
# Why does pm.Dirichlet return a (1,3) array instead of (6,3) ?
p = pm.Dirichlet('p', alphas)

# Multinomial is observed and take values from F_data
F = pm.Multinomial('F', 1, p, value = F_data, observed = True)


# Running MCMC
M = pm.MCMC([a, b, alphas, p, F])
M.sample(iter = 10000, burn = 1000)

我遇到的问题如下:

  • 为什么Dirichlet只返回一组概率shape =(1,3),而不是每个温度shape =(6,3)的概率
  • 为什么模型无法获取数据中的温度依赖性? (b的值未显示预期趋势)

检查结果:

print('Shape of a :', a.value.shape)
print('Shape of b :',b.value.shape)
print('Shape of alpha :',alphas.value.shape)
print('Shape of p :',p.value.shape)
print(' ')
print('Value of b :',b.trace[:].mean(axis = 0))

# The model clearly doesn't find the temperature dependency of the failure probability
# I would expect for b something like 
# [positive value, close to zero value, negative value]

a的形状:(3,)
b的形状:(3,)
alpha形状:(6,3)
p的形状:(2,)

b的值:[1.11563624e-05 -1.86813307e-05 9.28859752e-05]

有什么建议吗?

0 个答案:

没有答案