新尝试，步骤：

Question

我有一些可以包含0和n高斯形状的嘈杂数据，我正在尝试实现一种算法，该算法采用最高数据点并按照以下“方案”拟合高斯数据：

新尝试，步骤：

通过所有数据点拟合样条
获得样条函数的一阶导数
获取两个数据点（左/右），其中f'（x）= 0，数据点最大强度
通过从3
返回的数据点拟合高斯
4A。在pdf
计算高斯曲线下的面积
计算原始数据点下的面积
计算由高斯区域解释的总面积百分比

我使用以下代码（最小工作示例）实现了这个概念：

#! /usr/bin/env python
from scipy.interpolate import InterpolatedUnivariateSpline
from scipy.optimize import curve_fit
from scipy.signal import argrelextrema
import numpy as np
import matplotlib
import matplotlib.pyplot as plt

data = [(9.60380153195,187214),(9.62028167623,181023),(9.63676350256,174588),(9.65324602212,169389),(9.66972824591,166921),(9.68621215187,167597),(9.70269675106,170838),(9.71918105436,175816),(9.73566703995,181552),(9.75215371878,186978),(9.76864010158,191718),(9.78512816681,194473),(9.80161692526,194169),(9.81810538757,191203),(9.83459553243,186603),(9.85108637051,180273),(9.86757691233,171996),(9.88406913682,163653),(9.90056205454,156032),(9.91705467586,149928),(9.93354897998,145410),(9.95004397733,141818),(9.96653867816,139042),(9.98303506191,137546),(9.99953213889,138724)]
data2 = [(9.60476933166,163571),(9.62125990879,156662),(9.63775225872,150535),(9.65424539203,146960),(9.67073831905,146794),(9.68723301904,149326),(9.70372850238,152616),(9.72022377931,155420),(9.73672082933,156151),(9.75321866271,154633),(9.76971628954,151549),(9.78621568961,148298),(9.80271587303,146333),(9.81921584976,146734),(9.83571759987,150351),(9.85222013334,156612),(9.86872245996,164192),(9.88522656011,171199),(9.90173144362,175697),(9.91823612015,176867),(9.93474257034,175029),(9.95124980389,171762),(9.96775683032,168449),(9.98426563055,165026)]

def gaussFunction(x, *p):
    """ TODO
    """
    A, mu, sigma = p
    return A*np.exp(-(x-mu)**2/(2.*sigma**2))

def quantify(data):
    """ TODO
    """
    backGround = 105000  # Normally this is dynamically determined but this value is fine for testing on the provided data
    time,intensity = zip(*data)
    x_data = np.array(time)
    y_data = np.array(intensity)
    newX = np.linspace(x_data[0], x_data[-1], 2500*(x_data[-1]-x_data[0]))
    f = InterpolatedUnivariateSpline(x_data, y_data)
    fPrime = f.derivative()
    newY = f(newX)
    newPrimeY = fPrime(newX)
    maxm = argrelextrema(newPrimeY, np.greater)
    minm = argrelextrema(newPrimeY, np.less)
    breaks = maxm[0].tolist() + minm[0].tolist()
    maxPoint = 0
    for index,j in enumerate(breaks):
        try:
            if max(newY[breaks[index]:breaks[index+1]]) > maxPoint:
                maxPoint = max(newY[breaks[index]:breaks[index+1]])
                xData = newX[breaks[index]:breaks[index+1]]
                yData = [x - backGround for x in newY[breaks[index]:breaks[index+1]]]
        except:
            pass
    # Gaussian fit on main points
    newGaussX = np.linspace(x_data[0], x_data[-1], 2500*(x_data[-1]-x_data[0]))
    p0 = [np.max(yData), xData[np.argmax(yData)],0.1]
    try:
        coeff, var_matrix = curve_fit(gaussFunction, xData, yData, p0)
        newGaussY = gaussFunction(newGaussX, *coeff)
        newGaussY = [x + backGround for x in newGaussY]


        # Generate plot for visual confirmation
        fig = plt.figure()

        ax = fig.add_subplot(111)
        plt.plot(x_data, y_data, 'b*')

        plt.plot((newX[0],newX[-1]),(backGround,backGround),'red')
        plt.plot(newX,newY, color='blue',linestyle='dashed')
        plt.plot(newGaussX, newGaussY, color='green',linestyle='dashed')
        plt.title("Test")
        plt.xlabel("rt [m]")
        plt.ylabel("intensity [au]")
        plt.savefig("Test.pdf",bbox_inches="tight")
        plt.close(fig)
    except:
        pass

# Call the test
#quantify(data)
quantify(data2)

通常情况下，背景（下图中的红线）是动态确定的，但为了这个例子，我将其设置为固定数字。我遇到的问题是，对于某些数据，它的效果非常好：

对应的f'（x）：

但是，对于其他一些数据，它的失败是非常糟糕的：

对应的f'（x）：

因此，我想听听一些关于为什么会发生这种情况的建议或想法以及解决问题的潜在方法。我已经包含了下面图片中显示的数据（如果有人想尝试的话）：

Answer 1

错误位于以下位：

breaks = maxm[0].tolist() + minm[0].tolist()
for index,j in enumerate(breaks):

breaks列表现在包含最大值和最小值，但它们不按时间排序。导致该列表产生以下不合适的数据点：9.78,9.62和9.86。

程序将检查从9.78到9.62和9.62到9.86的数据，这意味着9.62到9.86包含最高强度数据点，产生第二个图中显示的拟合。

只需在中断之间添加sort，修复就相当简单了，如下所示：

breaks = maxm[0].tolist() + minm[0].tolist()
breaks = sorted(breaks)
for index,j in enumerate(breaks):

然后，该程序产生了更接近我期望的拟合：

在某些情况下，将单个高斯拟合为“嘈杂”数据会导致拟合不良

新尝试，步骤：

1 个答案: