Question

我遇到python中scipy.optimize.curve_fit函数的数值准确性问题。在我看来，当我想要~15位时，我只能获得~8位数的精确度。我有以下数据创建的一些数据（此时是人工创建的）：

enter image description here

其中，术语1~10 ^ -3，术语2~10 ^ -6，术语3为~10 ^ -11。在数据中，我随机变化A（这是高斯误差）。然后我尝试将其与模型相匹配：

enter image description here

其中lambda是常量，我只适合alpha（它是函数中的参数）。现在我期望看到alpha和A之间的线性关系，因为数据创建中的术语1和2也在模型中，因此它们应该完全取消;

enter image description here

因此;

enter image description here

然而，对于小A（~10 ^ -11及以下）会发生什么，alpha无法与A一起扩展，也就是说，A }变得越来越小，alpha水平并保持不变。

供参考，我打电话给以下人士： op，pcov = scipy.optimize.curve_fit（model，xdata，ydata，p0 = None，sigma = sig）

我的第一个想法是我没有使用双精度，但我很确定python会自动以双精度创建数字。然后我认为文档的问题可能会切断数字？无论如何，我可以把我的代码放在这里，但它有点复杂。有没有办法确保曲线拟合函数保存我的数字？

非常感谢你的帮助！

编辑：以下是我的代码：

# Import proper packages
import numpy as np
import numpy.random as npr
import scipy as sp
import scipy.constants as spc
import scipy.optimize as spo
from matplotlib import pyplot as plt
from numpy import ndarray as nda
from decimal import *

# Declare global variables
AU = 149597871000.0
test_lambda = 20*AU
M_Sun = (1.98855*(sp.power(10.0,30.0)))
M_Jupiter = (M_Sun/1047.3486)
test_jupiter_mass = M_Jupiter
test_sun_mass = M_Sun
rad_jup = 5.2*AU
ran = np.linspace(AU, 100*AU, num=100)
delta_a = np.power(10.0, -11.0)
chi_limit = 118.498

# Model acceleration of the spacecraft from the sun (with Yukawa term)
def model1(distance, A):
    return (spc.G)*(M_Sun/(distance**2.0))*(1 +A*(np.exp(-distance/test_lambda))) + (spc.G)*(M_Jupiter*distance)/((distance**2.0 + rad_jup**2.0)**(3.0/2.0)) 

# Function that creates a data point for test 1
def data1(distance, dela):
    return (spc.G)*(M_Sun/(distance**2.0) + (M_Jupiter*distance)/((distance**2.0 + rad_jup**2.0)**(3.0/2.0))) + dela

# Generates a list of 100 data sets varying by ~&a for test 1
def generate_data1():
    data_list = []
    for i in range(100):
        acc_lst = []
        for dist in ran:
            x = data1(dist, npr.normal(0, delta_a))
            acc_lst.append(x)
        data_list.append(acc_lst)
    return data_list

# Generates a list of standard deviations at each distance from the sun. Since &a is constant, the standard deviation of each point is constant
def generate_sig():
    sig = []
    for i in range(100):
        sig.append(delta_a)
    return sig

# Finds alpha for test 1, since we vary &a in test 1, we need to generate new data for each time we find alpha
def find_alpha1(data_list, sig):
    alphas = []
    for data in data_list:
        op, pcov = spo.curve_fit(model1, ran, data, p0=None, sigma=sig)
        alphas.append(op[0])
    return alphas

# Tests the dependence of alpha on &a and plots the dependence
def test1():
    global delta_a
    global test_lambda
    test_lambda = 20*AU
    delta_a = 10.0**-20.0
    alphas = []
    delta_as = []
    for i in range(20):
        print i
        data_list = generate_data1()
        print np.array(data_list[0])
        sig = generate_sig()
        alpha = find_alpha1(data_list, sig)    
        delas = []
        for alp in alpha:
            if alp < 0:
                x = 0
                plt.loglog(delta_a, abs(alp), '.' 'r')
            else: 
                x = 0
                plt.loglog(delta_a, alp, '.' 'b')
        delta_a *= 10
    plt.xlabel('Delta A')
    plt.ylabel('Alpha (at Lambda = 5 AU)')
    plt.show()

def main():
    test1()

if __name__ == '__main__':
    main()

Answer 1

我认为这与这里使用的最小化算法和最大可获得的精度有关。

我记得几年前在数字食谱中读过它，我会看看能否为你挖掘一个参考文献。

编辑：

链接到数字食谱here - 跳至第394页，然后阅读该章节。请注意第404页的第3段：

“请尽快提醒我们tol通常不应该小一点而不是机器浮点精度的平方根。“

并且mathematica提到如果你想要准确性，那么你需要采用不同的方法，并且他们不会使用LMA，除非问题被认为是一个正方形的总和问题

鉴于你只是在进行一维拟合，尝试实施他们在该章中提到的拟合算法之一可能是一个很好的练习。

你实际上想要实现的目标是什么？根据我的理解，你实际上是在计算你添加到曲线中的随机噪音量。但那不是你真正做的事 - 除非我理解错了......

EDIT2：

因此，在阅读了如何生成数据之后，您正在应用的数据和模型存在问题。

你基本上适合这两方面：

enter image description here

你基本上试图将高斯的高度拟合为随机数。你不适合高斯这些数字的频率。

查看您的代码，并根据您的说法判断，这不是您最终的目标，而您只是想要习惯优化方法？

如果您随机调整距离太阳的距离，然后适合数据并查看是否可以最小化以找到生成数据集的距离，那将更有意义吗？

使用Python中的scipy.optimize.curve_fit进行数值精度

1 个答案: