我有一些可以包含0和n高斯形状的嘈杂数据,我正在尝试实现一种算法,该算法采用最高数据点并按照以下“方案”拟合高斯数据:
通过从3
返回的数据点拟合高斯4A。在pdf
我使用以下代码(最小工作示例)实现了这个概念:
#! /usr/bin/env python
from scipy.interpolate import InterpolatedUnivariateSpline
from scipy.optimize import curve_fit
from scipy.signal import argrelextrema
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
data = [(9.60380153195,187214),(9.62028167623,181023),(9.63676350256,174588),(9.65324602212,169389),(9.66972824591,166921),(9.68621215187,167597),(9.70269675106,170838),(9.71918105436,175816),(9.73566703995,181552),(9.75215371878,186978),(9.76864010158,191718),(9.78512816681,194473),(9.80161692526,194169),(9.81810538757,191203),(9.83459553243,186603),(9.85108637051,180273),(9.86757691233,171996),(9.88406913682,163653),(9.90056205454,156032),(9.91705467586,149928),(9.93354897998,145410),(9.95004397733,141818),(9.96653867816,139042),(9.98303506191,137546),(9.99953213889,138724)]
data2 = [(9.60476933166,163571),(9.62125990879,156662),(9.63775225872,150535),(9.65424539203,146960),(9.67073831905,146794),(9.68723301904,149326),(9.70372850238,152616),(9.72022377931,155420),(9.73672082933,156151),(9.75321866271,154633),(9.76971628954,151549),(9.78621568961,148298),(9.80271587303,146333),(9.81921584976,146734),(9.83571759987,150351),(9.85222013334,156612),(9.86872245996,164192),(9.88522656011,171199),(9.90173144362,175697),(9.91823612015,176867),(9.93474257034,175029),(9.95124980389,171762),(9.96775683032,168449),(9.98426563055,165026)]
def gaussFunction(x, *p):
""" TODO
"""
A, mu, sigma = p
return A*np.exp(-(x-mu)**2/(2.*sigma**2))
def quantify(data):
""" TODO
"""
backGround = 105000 # Normally this is dynamically determined but this value is fine for testing on the provided data
time,intensity = zip(*data)
x_data = np.array(time)
y_data = np.array(intensity)
newX = np.linspace(x_data[0], x_data[-1], 2500*(x_data[-1]-x_data[0]))
f = InterpolatedUnivariateSpline(x_data, y_data)
fPrime = f.derivative()
newY = f(newX)
newPrimeY = fPrime(newX)
maxm = argrelextrema(newPrimeY, np.greater)
minm = argrelextrema(newPrimeY, np.less)
breaks = maxm[0].tolist() + minm[0].tolist()
maxPoint = 0
for index,j in enumerate(breaks):
try:
if max(newY[breaks[index]:breaks[index+1]]) > maxPoint:
maxPoint = max(newY[breaks[index]:breaks[index+1]])
xData = newX[breaks[index]:breaks[index+1]]
yData = [x - backGround for x in newY[breaks[index]:breaks[index+1]]]
except:
pass
# Gaussian fit on main points
newGaussX = np.linspace(x_data[0], x_data[-1], 2500*(x_data[-1]-x_data[0]))
p0 = [np.max(yData), xData[np.argmax(yData)],0.1]
try:
coeff, var_matrix = curve_fit(gaussFunction, xData, yData, p0)
newGaussY = gaussFunction(newGaussX, *coeff)
newGaussY = [x + backGround for x in newGaussY]
# Generate plot for visual confirmation
fig = plt.figure()
ax = fig.add_subplot(111)
plt.plot(x_data, y_data, 'b*')
plt.plot((newX[0],newX[-1]),(backGround,backGround),'red')
plt.plot(newX,newY, color='blue',linestyle='dashed')
plt.plot(newGaussX, newGaussY, color='green',linestyle='dashed')
plt.title("Test")
plt.xlabel("rt [m]")
plt.ylabel("intensity [au]")
plt.savefig("Test.pdf",bbox_inches="tight")
plt.close(fig)
except:
pass
# Call the test
#quantify(data)
quantify(data2)
通常情况下,背景(下图中的红线)是动态确定的,但为了这个例子,我将其设置为固定数字。我遇到的问题是,对于某些数据,它的效果非常好:
对应的f'(x):
但是,对于其他一些数据,它的失败是非常糟糕的:
对应的f'(x):
因此,我想听听一些关于为什么会发生这种情况的建议或想法以及解决问题的潜在方法。我已经包含了下面图片中显示的数据(如果有人想尝试的话):
答案 0 :(得分:0)
错误位于以下位:
breaks = maxm[0].tolist() + minm[0].tolist()
for index,j in enumerate(breaks):
breaks
列表现在包含最大值和最小值,但它们不按时间排序。导致该列表产生以下不合适的数据点:9.78,9.62和9.86。
程序将检查从9.78到9.62和9.62到9.86的数据,这意味着9.62到9.86包含最高强度数据点,产生第二个图中显示的拟合。
只需在中断之间添加sort
,修复就相当简单了,如下所示:
breaks = maxm[0].tolist() + minm[0].tolist()
breaks = sorted(breaks)
for index,j in enumerate(breaks):
然后,该程序产生了更接近我期望的拟合: