在某些情况下,将单个高斯拟合为“嘈杂”数据会导致拟合不良

时间:2017-04-04 13:08:45

标签: python numpy scipy curve-fitting

我有一些可以包含0和n高斯形状的嘈杂数据,我正在尝试实现一种算法,该算法采用最高数据点并按照以下“方案”拟合高斯数据:

新尝试,步骤:

  1. 通过所有数据点拟合样条
  2. 获得样条函数的一阶导数
  3. 获取两个数据点(左/右),其中f'(x)= 0,数据点最大强度
  4. 通过从3

    返回的数据点拟合高斯

    4A。在pdf

  5. 中绘制高斯(在基线处停止)
  6. 计算高斯曲线下的面积
  7. 计算原始数据点下的面积
  8. 计算由高斯区域解释的总面积百分比
  9. 我使用以下代码(最小工作示例)实现了这个概念:

    #! /usr/bin/env python
    from scipy.interpolate import InterpolatedUnivariateSpline
    from scipy.optimize import curve_fit
    from scipy.signal import argrelextrema
    import numpy as np
    import matplotlib
    import matplotlib.pyplot as plt
    
    data = [(9.60380153195,187214),(9.62028167623,181023),(9.63676350256,174588),(9.65324602212,169389),(9.66972824591,166921),(9.68621215187,167597),(9.70269675106,170838),(9.71918105436,175816),(9.73566703995,181552),(9.75215371878,186978),(9.76864010158,191718),(9.78512816681,194473),(9.80161692526,194169),(9.81810538757,191203),(9.83459553243,186603),(9.85108637051,180273),(9.86757691233,171996),(9.88406913682,163653),(9.90056205454,156032),(9.91705467586,149928),(9.93354897998,145410),(9.95004397733,141818),(9.96653867816,139042),(9.98303506191,137546),(9.99953213889,138724)]
    data2 = [(9.60476933166,163571),(9.62125990879,156662),(9.63775225872,150535),(9.65424539203,146960),(9.67073831905,146794),(9.68723301904,149326),(9.70372850238,152616),(9.72022377931,155420),(9.73672082933,156151),(9.75321866271,154633),(9.76971628954,151549),(9.78621568961,148298),(9.80271587303,146333),(9.81921584976,146734),(9.83571759987,150351),(9.85222013334,156612),(9.86872245996,164192),(9.88522656011,171199),(9.90173144362,175697),(9.91823612015,176867),(9.93474257034,175029),(9.95124980389,171762),(9.96775683032,168449),(9.98426563055,165026)]
    
    def gaussFunction(x, *p):
        """ TODO
        """
        A, mu, sigma = p
        return A*np.exp(-(x-mu)**2/(2.*sigma**2))
    
    def quantify(data):
        """ TODO
        """
        backGround = 105000  # Normally this is dynamically determined but this value is fine for testing on the provided data
        time,intensity = zip(*data)
        x_data = np.array(time)
        y_data = np.array(intensity)
        newX = np.linspace(x_data[0], x_data[-1], 2500*(x_data[-1]-x_data[0]))
        f = InterpolatedUnivariateSpline(x_data, y_data)
        fPrime = f.derivative()
        newY = f(newX)
        newPrimeY = fPrime(newX)
        maxm = argrelextrema(newPrimeY, np.greater)
        minm = argrelextrema(newPrimeY, np.less)
        breaks = maxm[0].tolist() + minm[0].tolist()
        maxPoint = 0
        for index,j in enumerate(breaks):
            try:
                if max(newY[breaks[index]:breaks[index+1]]) > maxPoint:
                    maxPoint = max(newY[breaks[index]:breaks[index+1]])
                    xData = newX[breaks[index]:breaks[index+1]]
                    yData = [x - backGround for x in newY[breaks[index]:breaks[index+1]]]
            except:
                pass
        # Gaussian fit on main points
        newGaussX = np.linspace(x_data[0], x_data[-1], 2500*(x_data[-1]-x_data[0]))
        p0 = [np.max(yData), xData[np.argmax(yData)],0.1]
        try:
            coeff, var_matrix = curve_fit(gaussFunction, xData, yData, p0)
            newGaussY = gaussFunction(newGaussX, *coeff)
            newGaussY = [x + backGround for x in newGaussY]
    
    
            # Generate plot for visual confirmation
            fig = plt.figure()
    
            ax = fig.add_subplot(111)
            plt.plot(x_data, y_data, 'b*')
    
            plt.plot((newX[0],newX[-1]),(backGround,backGround),'red')
            plt.plot(newX,newY, color='blue',linestyle='dashed')
            plt.plot(newGaussX, newGaussY, color='green',linestyle='dashed')
            plt.title("Test")
            plt.xlabel("rt [m]")
            plt.ylabel("intensity [au]")
            plt.savefig("Test.pdf",bbox_inches="tight")
            plt.close(fig)
        except:
            pass
    
    # Call the test
    #quantify(data)
    quantify(data2)
    

    通常情况下,背景(下图中的红线)是动态确定的,但为了这个例子,我将其设置为固定数字。我遇到的问题是,对于某些数据,它的效果非常好:

    enter image description here

    对应的f'(x):

    enter image description here

    但是,对于其他一些数据,它的失败是非常糟糕的:

    enter image description here

    对应的f'(x):

    enter image description here

    因此,我想听听一些关于为什么会发生这种情况的建议或想法以及解决问题的潜在方法。我已经包含了下面图片中显示的数据(如果有人想尝试的话):

1 个答案:

答案 0 :(得分:0)

错误位于以下位:

breaks = maxm[0].tolist() + minm[0].tolist()
for index,j in enumerate(breaks):

breaks列表现在包含最大值和最小值,但它们不按时间排序。导致该列表产生以下不合适的数据点:9.78,9.62和9.86。

程序将检查从9.78到9.62和9.62到9.86的数据,这意味着9.62到9.86包含最高强度数据点,产生第二个图中显示的拟合。

只需在中断之间添加sort,修复就相当简单了,如下所示:

breaks = maxm[0].tolist() + minm[0].tolist()
breaks = sorted(breaks)
for index,j in enumerate(breaks):
然后,该程序产生了更接近我期望的拟合:

enter image description here