Question

我想在一组数据中拟合偏态正态分布（SN）。为此，必须估计SN的位置，比例和形状参数。

这些参数也可以通过分析计算，但我正在寻找一种估算方法。

到目前为止，我已经通过使用最小二乘法的方法估计了参数，但我也希望通过使用最大似然法来实现。

在代码的第一部分中，构建了自定义偏斜正态分布具有已知的位置，比例和形状参数只是为了开始：

import numpy as np
import matplotlib.pyplot as plt
from scipy import special
from scipy import optimize
from scipy import stats

# PDF of a standard normal distrinbution:
def pdf(x):
    return 1 / np.sqrt(2 * np.pi) * np.exp(-x ** 2 / 2)  

# CDF of a standard normal distrinbution:
def cdf(x):
    return (1 + special.erf(x / np.sqrt(2))) / 2         

# PDF of skew normal distribution:
def skew(x, e, w, a):
    t = (x - e) / w
    return 2 / w * pdf(t) * cdf(a * t)                   

# build a custom skew normal distribution:
n = 2**10
e = 1.0    # location
w = 2.0    # scale
a = 4.0    # shape

x = np.linspace(-10, 10, n)
y = skew(x, e, w, a)

plt.plot(x, y)
plt.show()

现在让我们假设必须安装SN模型的数据是：

e = 5.0    # location
w = 1.5    # scale
a = 3.0    # shape

data = skew(x, e, w, a) + stats.norm.rvs(0, 0.04, size=n)  # real data

# real data is a set of noisy data following the SN distribution
# with e=5.0, w=1.5, a=3.0

plt.plot(x, data)
plt.show()

假设e，w和a未知，估计方法允许我们从“真实”数据中找到它们。

使用最小二乘和的方法：

def opt(parameters, x):
    return skew(x, parameters[0], parameters[1], parameters[2]) - data

initial_estimates = np.array([1., 1., 1.])

parameters_est = optimize.leastsq(opt, initial_estimates, (x,))

print(parameters_est)

# printed: (array([ 4.9984384 ,  1.49246143,  3.03745207]), 1) 

model1 = skew(
              x, 
              parameters_est[0][0], 
              parameters_est[0][1], 
              parameters_est[0][2]
              )

plt.plot(x, data)
plt.plot(x, model1)
plt.show()

因此，最小二乘法的方法起作用，并提供e，w和接近实际值的估计值。

现在我正在尝试最大似然估计方法。在代码中：

def log_lik(params):

    location = params[0]
    scale = params[1]
    shape = params[2]

    # PDF of a normal distrinbution:
    phi = (1 / (scale * np.sqrt(2 * np.pi))) * np.exp((-(y-location)**2) \  
          /(2 * scale ** 2))

    # CDF of a normal distrinbution:
    PHI = 0.5 * (1 + special.erf(shape * (y - location) \ 
          /(scale * np.sqrt(2))))

    # the PDF of SN is 2*phi*PHI

    # log-likelihood function for SN:
    L = n * np.log(2) - n * np.log(scale) + np.sum(np.log(phi)) \ 
        + np.sum(np.log(PHI)) 

    return L

y = data

initial_estimates = np.array([1., 1., 1.])

MLE = optimize.minimize(
                        log_lik, initial_estimates, method='nelder-mead'
                        )

print(MLE.x)

# printed: [ 1.7  0.1  1.3]

所以，似乎我无法让它发挥作用。我的问题只是错误以及如何估计e，w和使用最大似然法。

Answer 1

通常，人们在单变量偏态正常情况下使用中心参数化（CP）而不是直接参数化（DP）进行优化。如果使用DP，就像现在所做的那样，则需要更明智地选择初始值，并且估计值非常取决于初始值。我建议您从几个不同的起始值（分隔开）开始，然后比较LogL值。

顺便说一句，您是否将对数可能性降到最低？应该是“最大化”。

使用python

1 个答案: