我计算了桥梁的载荷,我希望使用最大似然估计将Gumbel的分布拟合到最高的20%。我需要帮助计算分布的参数。我已经阅读了scipy.optimize文档,但我无法理解如何在其中应用函数来估计两个参数函数。
以下是一些可能有所帮助的理论:
存在两个似然函数(L1和L2),一个用于高于某个阈值的值(x> = C),一个用于低于(x 这里有我编写的一些代码:non_truncated_data = ([15.999737471905252, 16.105716234887431, 17.947809230275304, 16.147752064149291, 15.991427126788327, 16.687542227378565, 17.125139229445359, 19.39645340792385, 16.837044960487795, 15.804473320190725, 16.018569387471025, 16.600876724289019, 16.161306985203151, 17.338636901595873, 18.477371969176406, 17.897236722220281, 16.626465201654593, 16.196548622931672, 16.013794215070927, 16.30367884232831, 17.182106070966608, 18.984566931768452, 16.885737663740024, 16.088051117522948, 15.790480003140173, 18.160947973898388, 18.318158853376037])
threshold = 15.78581825859324
def maximum_likelihood_function(non_truncated_loads, threshold, loc, scale):
"""Calculates maximum likelihood function's value for given truncated data
with given parameters.
Maximum likelihood function for truncated data is L1 * L2. Where L1 is a
product of multiplication of pdf values at non-truncated known values
(non_truncated_values). L2 is a the probability that threshold value will
be exceeded.
"""
is_first = True
# calculates L1
for x in non_truncated_loads:
if is_first:
L1 = gumbel_pdf(x, loc, scale)
is_first = False
else:
L1 *= gumbel_pdf(x, loc, scale)
# calculates L2
cdf_at_threshold = gumbel_cdf(threshold, loc, scale)
L2 = 1 - cdf_at_threshold
return L1*L2
def gumbel_pdf(x, loc, scale):
"""Returns the value of Gumbel's pdf with parameters loc and scale at x .
"""
# exponent
e = math.exp(1)
# substitute
z = (x - loc)/scale
return (1/scale) * (e**(-(z + (e**(-z)))))
def gumbel_cdf(x, loc, scale):
"""Returns the value of Gumbel's cdf with parameters loc and scale at x.
"""
# exponent
e = math.exp(1)
return (e**(-e**(-(x-loc)/scale)))
答案 0 :(得分:1)
首先,使用scipy.optimize
优化函数的最简单方法是构造目标函数,使得第一个参数是需要优化的参数列表,以下参数指定其他内容,例如作为数据和固定参数。
其次,使用numpy
因此我们有这些:
In [61]:
#modified pdf and cdf
def gumbel_pdf(x, loc, scale):
"""Returns the value of Gumbel's pdf with parameters loc and scale at x .
"""
# substitute
z = (x - loc)/scale
return (1./scale) * (np.exp(-(z + (np.exp(-z)))))
def gumbel_cdf(x, loc, scale):
"""Returns the value of Gumbel's cdf with parameters loc and scale at x.
"""
return np.exp(-np.exp(-(x-loc)/scale))
In [62]:
def trunc_GBL(p, x):
threshold=p[0]
loc=p[1]
scale=p[2]
x1=x[x<threshold]
nx2=len(x[x>=threshold])
L1=(-np.log((gumbel_pdf(x1, loc, scale)/scale))).sum()
L2=(-np.log(1-gumbel_cdf(threshold, loc, scale)))*nx2
#print x1, nx2, L1, L2
return L1+L2
In [63]:
import scipy.optimize as so
In [64]:
#first we make a simple Gumbel fit
so.fmin(lambda p, x: (-np.log(gumbel_pdf(x, p[0], p[1]))).sum(), [0.5,0.5], args=(np.array(non_truncated_data),))
Optimization terminated successfully.
Current function value: 35.401255
Iterations: 70
Function evaluations: 133
Out[64]:
array([ 16.47028986, 0.72449091])
In [65]:
#then we use the result as starting value for your truncated Gumbel fit
so.fmin(trunc_GBL, [17, 16.47028986, 0.72449091], args=(np.array(non_truncated_data),))
Optimization terminated successfully.
Current function value: 0.000000
Iterations: 25
Function evaluations: 94
Out[65]:
array([ 13.41111111, 16.65329308, 0.79694 ])
在trunc_GBL
函数中,我用缩放的pdf
请参阅此处的基本原理,主要是因为您的L1
是基于pdf的,而L2
是基于cdf的:http://support.sas.com/documentation/cdl/en/statug/63033/HTML/default/viewer.htm#statug_lifereg_sect018.htm
然后我们注意到一个问题:在最后一个输出中看到Current function value: 0.000000
。负对数似然函数为0.
这是因为:
In [66]:
gumbel_cdf(13.41111111, 16.47028986, 0.72449091)
Out[66]:
2.3923515777163676e-30
有效0.这意味着根据您刚才描述的模型,当阈值足够低时,总是达到最大值,以使L1
非exsit(x < threshold
为空)和{{对于数据中的所有项目,1}}为1(L2
为1-F(C)
。
出于这个原因,你的模型看起来并不适合我。你可能想重新考虑一下。
我们可以进一步隔离1
并将其视为固定参数:
threshold
以不同的方式调用优化器:
def trunc_GBL(p, x, threshold):
loc=p[0]
scale=p[1]
x1=x[x<threshold]
nx2=len(x[x>=threshold])
L1=(-np.log((gumbel_pdf(x1, loc, scale)/scale))).sum()
L2=(-np.log(1-gumbel_cdf(threshold, loc, scale)))*nx2
#print x1, nx2, L1, L2
return L1+L2
这样,如果您想要70%分位数,您只需将其更改为so.fmin(trunc_GBL, [0.5, 0.5], args=(X, np.percentile(X, 20)))
Optimization terminated successfully.
Current function value: 20.412818
Iterations: 72
Function evaluations: 136
Out[9]:
array([ 16.34594943, 0.45253201])
,依此类推。 np.percentile(X, 30)
只是执行np.percentile()