我正在使用scipy.optimize.minimize来尝试确定概率密度函数的最佳参数(PDF)。我的PDF涉及离散高斯核(https://en.wikipedia.org/wiki/Gaussian_function和https://en.wikipedia.org/wiki/Scale_space_implementation#The_discrete_Gaussian_kernel)。
理论上,我知道PDF的平均值(PDF应该集中在哪里)。因此,如果我要计算PDF的期望值,我应该恢复我已经知道的平均值。我的PDF是以n的离散值采样的(它必须永远不会是负数,应该从0开始才能产生任何物理意义),我试图确定t的最佳值(“比例因子”)以恢复平均值PDF(我已经提前知道了)。
确定最佳“缩放因子”t的最小工作示例如下:
#!/usr/bin/env python3
import numpy as np
from scipy.special import iv
from scipy.optimize import minimize
def discrete_gaussian_kernel(t, n):
return np.exp(-t) * iv(n, t)
def expectation_value(t, average):
# One constraint is that the starting value
# of the range over which I sample the PDF
# should be 0.
# Method 1 - This seems to give good, consistent results
int_average = int(average)
ceiling_average = int(np.ceil(average))
N = range(int_average - ceiling_average + 1,
int_average + ceiling_average + 2)
# Method 2 - The multiplicative factor for 'end' is arbitrary.
# I should in principle be able make end be as large as
# I want since the PDF goes to zero for large values of n,
# but this seems to impact the result and I do now know why.
#start = 0
#end = 2 * int(average)
#N = range(start, end)
return np.sum([n * discrete_gaussian_kernel(t, n - average) for n in N])
def minimize_function(t, average):
return average - expectation_value(t, average)
if __name__ == '__main__':
average = 8.33342
#average = 7.33342
solution = minimize(fun = minimize_function,
x0 = 1,
args = average)
print(solution)
t = solution.x[0]
print(' solution t =', t)
print(' given average =', average)
print('recalculated average =', expectation_value(t, average))
我的最小工作示例有两个问题:
1)代码适用于我为变量“average”选择的某些值。一个例子是当值为8.33342时。但是,该代码不适用于其他值,例如7.33342。在这种情况下,我得到
RuntimeWarning: overflow encountered in exp
所以我觉得scipy.optimize.minimize可能选择了一个糟糕的t值(比如一个大的负数)。我确信这是问题,因为我在函数expectation_value中打印出t的值,而t变得越来越负。所以我想在“t”可能采用的值的可能值上添加界限(“t”不应该是负数)。查看scipy.optimize.minimize的文档,有一个bounds关键字参数。所以我试过了:
solution = minimize(fun = minimize_function,
x0 = 1,
args = average,
bounds = ((0, None)))
但是我收到了错误:
ValueError: length of x0 != length of bounds
我在stackoverflow上搜索了这个错误,还有其他一些线程,但我没有找到任何帮助。如何成功设置绑定?
2)我的另一个问题与scipy.optimize.minimize有关,我对计算期望值的范围很敏感。平均值为
average = 8.33342
和计算范围的方法为
# Method 1 - This seems to give good, consistent results
int_average = int(average)
ceiling_average = int(np.ceil(average))
N = range(int_average - ceiling_average + 1,
int_average + ceiling_average + 2)
“重新计算的平均值”是8.3329696426。但对于另一种方法(范围非常相似),
# Method 2 - The multiplicative factor for 'end' is arbitrary.
# I should in principle be able make end be as large as
# I want since the PDF goes to zero for large values of n,
# but this seems to impact the result and I do now know why.
start = 0
end = 2 * int(average)
N = range(start, end)
“重新计算的平均值”是8.31991111857。在每种情况下范围都相似,所以我不知道为什么会有这么大的变化,特别是因为我的重新计算的平均值与真实平均值尽可能接近。如果我要将范围扩展到更大的值(我认为这是合理的,因为PDF在那里变为零),
start = 0
end = 4 * int(average)
N = range(start, end)
“重新计算的平均值”是9.12939372912,这更糟糕。那么是否有一致的方法来计算范围,以便重建的平均值总是尽可能接近真实的平均值?缩放因子可以取任何值,所以我认为scipy.optimize.minimize应该能够找到一个缩放因子来准确地恢复真实的平均值。