InterpolatedUniveriateSpline和interp1d三次样条不能通过非常小的y值的所有点?

时间:2018-04-25 22:47:11

标签: python scipy interpolation

我一直在寻找使用scipy中的三次样条函数来平滑一些概率密度函数(PDF),并且我遇到问题时scipy.interpolate.InterpolatedUnivariateSplinescipy.interpolate.interp1d没有通过所有结小的y值。

import math
import pandas as pd
import numpy as np
from scipy.stats import norm
from scipy.interpolate import InterpolatedUnivariateSpline, interp1d


def main():
    breadth = 0.60
    stepsize = 0.01
    mean = 0.0
    stdev = 0.02

    steps = 2. * breadth / stepsize + 1
    retRange = np.linspace(-breadth, breadth, num=steps)
    probs = {}
    for ret in retRange:
        retup = ret + stepsize / 2.
        retdown = ret - stepsize / 2.
        probup = norm.cdf(retup, loc=mean, scale=stdev)
        probdown = norm.cdf(retdown, loc=mean, scale=stdev)
        prob = probup - probdown
        probs[ret] = prob
    probs = pd.Series(probs, name='PDF')
    probs = probs / probs.sum()

    spl = InterpolatedUnivariateSpline(probs.index, probs, k=3, ext='zeros')
    tmp = pd.Series(index=retRange, data=retRange, name='SmoothedPDF').map(spl)

    out = pd.concat([probs, tmp], axis=1)
    out['diff'] = out['PDF'] - out['SmoothedPDF']
    print out

if __name__ == '__main__':
    main()

问题是输出DataFrame显示差异。我可以替换interp1d代替InterpolateUnivariateSpline并获得相同的问题(但结果不同)。问题是值得注意的是,结点处的插值y值的差异可能大于原始的y值。

                 PDF   SmoothedPDF          diff
-0.60  8.670691e-195 -2.650272e-50  2.650272e-50
-0.59  2.245130e-188  2.120217e-49 -2.120217e-49
-0.58  4.528786e-182 -9.806006e-49  9.806006e-49
-0.57  7.116703e-176  3.816392e-48 -3.816392e-48
-0.56  8.712391e-170 -1.431147e-47  1.431147e-47
-0.55  8.309255e-164  5.342948e-47 -5.342948e-47
-0.54  6.173881e-158 -1.994065e-46  1.994065e-46
-0.53  3.573809e-152  7.441963e-46 -7.441963e-46
-0.52  1.611710e-146 -2.777379e-45  2.777379e-45
-0.51  5.662799e-141  1.036532e-44 -1.036532e-44
-0.50  1.550138e-135 -3.868390e-44  3.868390e-44
-0.49  3.306066e-130  1.443703e-43 -1.443703e-43
-0.48  5.493659e-125 -5.387972e-43  5.387972e-43
-0.47  7.112603e-120  2.010818e-42 -2.010818e-42
-0.46  7.174975e-115 -7.504477e-42  7.504477e-42
-0.45  5.639566e-110  2.800709e-41 -2.800709e-41
-0.44  3.453932e-105 -1.045239e-40  1.045239e-40
-0.43  1.648294e-100  3.900884e-40 -3.900884e-40
-0.42   6.129407e-96 -1.455830e-39  1.455830e-39
-0.41   1.776139e-91  5.433231e-39 -5.433231e-39
-0.40   4.010714e-87 -2.027709e-38  2.027709e-38
-0.39   7.057745e-83  7.567514e-38 -7.567514e-38
-0.38   9.678846e-79 -2.824235e-37  2.824235e-37
-0.37   1.034450e-74  1.054019e-36 -1.054019e-36
-0.36   8.616667e-71 -3.933652e-36  3.933652e-36
-0.35   5.594107e-67  1.468059e-35 -1.468059e-35
-0.34   2.830755e-63 -5.478870e-35  5.478870e-35
-0.33   1.116539e-59  2.044742e-34 -2.044742e-34
-0.32   3.432949e-56 -7.631082e-34  7.631082e-34
-0.31   8.228222e-53  2.847958e-33 -2.847958e-33
...              ...           ...           ...
 0.31   0.000000e+00 -7.461634e-41  7.461634e-41
 0.32   0.000000e+00 -3.730817e-41  3.730817e-41
 0.33   0.000000e+00 -5.022254e-42  5.022254e-42
 0.34   0.000000e+00  3.407958e-42 -3.407958e-42
 0.35   0.000000e+00  8.968310e-43 -8.968310e-43
 0.36   0.000000e+00 -2.578389e-43  2.578389e-43
 0.37   0.000000e+00 -1.177091e-43  1.177091e-43
 0.38   0.000000e+00  9.809089e-45 -9.809089e-45
 0.39   0.000000e+00 -3.152922e-45  3.152922e-45
 0.40   0.000000e+00 -4.816963e-46  4.816963e-46
 0.41   0.000000e+00 -6.568587e-47  6.568587e-47
 0.42   0.000000e+00  4.926440e-47 -4.926440e-47
 0.43   0.000000e+00  9.579189e-48 -9.579189e-48
 0.44   0.000000e+00  3.763253e-48 -3.763253e-48
 0.45   0.000000e+00  5.131708e-49 -5.131708e-49
 0.46   0.000000e+00 -1.924391e-49  1.924391e-49
 0.47   0.000000e+00 -1.603659e-49  1.603659e-49
 0.48   0.000000e+00 -2.405488e-50  2.405488e-50
 0.49   0.000000e+00 -1.202744e-50  1.202744e-50
 0.50   0.000000e+00 -1.586954e-51  1.586954e-51
 0.51   0.000000e+00 -2.923336e-52  2.923336e-52
 0.52   0.000000e+00 -3.132146e-53  3.132146e-53
 0.53   0.000000e+00  2.610122e-54 -2.610122e-54
 0.54   0.000000e+00  3.915183e-54 -3.915183e-54
 0.55   0.000000e+00 -1.631326e-54  1.631326e-54
 0.56   0.000000e+00 -1.386627e-54  1.386627e-54
 0.57   0.000000e+00 -2.650905e-55  2.650905e-55
 0.58   0.000000e+00  1.835242e-55 -1.835242e-55
 0.59   0.000000e+00  4.078315e-56 -4.078315e-56
 0.60   0.000000e+00  0.000000e+00  0.000000e+00

[121 rows x 3 columns]

是的,我正在研究一些涉及非常低概率的事情,因此规模确实(有时)很小。

如果这是一个精确的问题,我可以做一些重新调整,但在此之前,我想确认这是问题。

作为旁注,使用线性插值(k = 1)不会显示此问题。

1 个答案:

答案 0 :(得分:0)

问题似乎是您尝试使用全局插值的样条插值器,这非常适合您的输入。由于指数截止,由于双精度,你的很多值都是零;在此范围内,样条插值器会为您提供噪声。

这里是问题的数字表示,通过在半对数刻度上绘制PDF及其样条插值(请注意,如果PDF正好为零,则会在对数图中得到缺失点):

import matplotlib.pyplot as plt

plt.figure()
plt.semilogy(out.index, out.PDF, 'r-', out.index, out.SmoothedPDF.abs(), 'bo--',
             markerfacecolor='none') # note the abs for log
plt.legend(['PDF', 'SmoothedPDF (univariate spline)'])
plt.show()

good match in range where the original value is above 1e-30, very bad otherwise

首先,我需要在插值上调用.abs()以绘制(我们重新插入非零函数!)。其次,正如我已经指出的那样,x = 0.2以上的范围应该是缺失的,因为这里的函数值正好为零。相反,我们的噪声大约为1e-50,就像小x限制一样。总而言之,我们所看到的是,如果您的数据小于1e-30,那么您就不会有机会。

我的第一个想法是使用griddata,但它显示了类似的错误结果(即使在这种情况下数字错误要小得多)。那么如何使用本地插值器呢?输入piecewise cubic Hermite spline。使用稍微修改的代码版本,以演示插值和数值稳定性:

import math
import pandas as pd
import numpy as np
from scipy.stats import norm
from scipy.interpolate import pchip_interpolate
import matplotlib.pyplot as plt


def main():
    breadth = 0.60
    stepsize = 0.01
    mean = 0.0
    stdev = 0.02

    steps = 2. * breadth / stepsize + 1
    bigRange = np.linspace(-breadth, breadth, num=5*steps)
    retRange = bigRange[::5]
    probs = {}
    for ret in retRange:
        retup = ret + stepsize / 2.
        retdown = ret - stepsize / 2.
        probup = norm.cdf(retup, loc=mean, scale=stdev)
        probdown = norm.cdf(retdown, loc=mean, scale=stdev)
        prob = probup - probdown
        probs[ret] = prob
    probs = pd.Series(probs, name='PDF')
    probs = probs / probs.sum()

    tmp_big = pd.Series(index=bigRange,
                  data=pchip_interpolate(probs.index.values, probs.values, bigRange),
                  name='SmoothedPDF_big') # denser interpolated case
    tmp_check = pd.Series(index=retRange,
                    data=pchip_interpolate(probs.index.values, probs.values, retRange),
                    name='SmoothedPDF') # base points for checking stability

    out = pd.concat([probs, tmp_check], axis=1)
    out['diff'] = out['PDF'] - out['SmoothedPDF']

    return out,tmp_big

if __name__ == "__main__":    
    out,pchipout = main()
    plt.figure()
    plt.semilogy(out.index, out.PDF, 'r-', pchipout.index, pchipout.abs().values, 'bo--',
                 markerfacecolor='none')
    plt.legend(['PDF', 'SmoothedPDF (piecewise cubic Hermit spline)'])

    plt.figure()
    plt.plot(out.index, out.PDF, 'r-', pchipout.index, pchipout.values, 'bo--',
             markerfacecolor='none')
    plt.legend(['PDF', 'SmoothedPDF (piecewise cubic Hermit spline)'])


    plt.show()

稳定性证明:

>>> out['diff'].any()
False

误差范围的对数图:

plot with PCHIP spline: very nice agreement, negligible error at 1e-180 value

如您所见,协议几乎是完美的。请注意原始数据正好为0时明显缺少值。在您需要学习的小x范围内,低于1e-100的值仍然存在噪音。

插值结果的实际图表:

interpolated figure on a lin scale