我正在尝试在python中创建一个单变量样条插值以适应大量数据,当我绘制两个时,似乎存在很大差异。我已经尝试将平滑因子设置为许多不同的值(包括零,因此它必须遍历每个数据点),但是当我绘制两个时,我会得到很大的差异。
##
# Univariate Spline Interpolation
##
## This function interpolates the data by creating multiple times the amount of points in the data set and fitting a spline to it
## Input:
# dataX - X axis that you corresponds to dataset
# dataY - Y axis of data to fit spline on (must be same size as dataX)
# multiple - the multiplication factor, default is 2 ( <1 - Less points, 1 - same amount of points, >1 - more points)
# order - order of spline, default is 4 (3 - Cubic, 4 - Quartic)
## Output
# splinedDataX - splined X Axis
# splinedDataY - splined Y Axis
def univariate_spline_interpolation( dataX, dataY, multiple=2, order=4):
#Libraries
from numpy import linspace,exp
from numpy.random import randn
import matplotlib.pyplot as plt
from scipy.interpolate import UnivariateSpline, LSQUnivariateSpline
#Find sizes of x and y axis for comparison and multiple
sizeX = len(dataX)
sizeY = len(dataY)
#Error catching
if(sizeX != sizeY):
print "Data X axis and Y axis must have same size"
return
if(multiple <= 0):
print "Order must be greater than 0"
return
if(order < 1 or order >5):
print "Order must be 1 <= order <= 5"
return
#Create Spline
s = UnivariateSpline(dataX, dataY, k=3, s=0)
# s is smoothing factor so spline doesn't shoot off in between data points
#Positive smoothing factor used to choose the number of knots.
#Number of knots will be increased until the smoothing condition is satisfied:
# sum((w[i]*(y[i]-s(x[i])))**2,axis=0) <= s
#If None (default), s=len(w) which should be a good value if 1/w[i] is an estimate of the standard deviation of y[i].
#If 0, spline will interpolate through all data points.
#Create new axis based on numPoints
numPoints = sizeX * multiple #Find mumber of points for spline
startPt = dataX[1] #find value of first point on x axis
endPt = dataX[-1] #find value of last point on x axis
splinedDataX = linspace(startPt, endPt, numPoints) #create evenly spaced points on axis based on start, end, and number of desired data points
#Create Y axis of splined Data
splinedDataY = s(splinedDataX) #Create new Y axis with numPoints etnries of data splined to fit the original data
return splinedDataX, splinedDataY
##
# Text Cubic Spline
##
splinedX, splinedY = univariate_spline_interpolation(sensorTimestamp, filteredData[1], multiple=1)
print "old x data"
print "length", len(sensorTimestamp)
print "Starts", sensorTimestamp[0]
print "Ends", sensorTimestamp[-1]
print ""
print "new x data"
print "length", len(splinedX)
print "multiple", len(splinedX)/len(filteredData[1])
print "Starts", splinedX[0]
print "Ends", splinedX[-1]
print ""
print "old y data"
print "length", len(filteredData[1])
print "Starts", filteredData[1][0]
print "Ends", filteredData[1][-1]
print ""
print "new y data"
print "length", len(splinedY)
print "Starts", splinedY[0]
print "Ends", splinedY[-1]
difference = []
for i in splinedY:
difference.append(splinedY[i] - filteredData[1][i])
plt.figure(figsize=(20,3))
plt.plot(sensorTimestamp, filteredData[1], label='Non-Splined', marker='*')
plt.plot(splinedX, splinedY, label='Splined')
plt.plot(sensorTimestamp, difference, label='Difference', ls='--')
plt.title(' BW Filtered Data from LED 1')
plt.axis([19, 30, -300, 300])
plt.legend(loc='best')
plt.show()
输出打印:
old x data
length 14690
Starts 0.0
Ends 497.178565979
new x data
length 14690
multiple 1.0
Starts 0.0555429458618
Ends 497.178565979
old y data
length 14690
Starts 50.2707843894
Ends 341.661410048
new y data
length 14690
Starts 416.803282313
Ends 341.661410048
正如您所看到的那样,差异很大,但在图表上数据似乎是完全相同的点(或非常接近)。
答案 0 :(得分:0)
当您用
计算差异时,似乎splinedY[i] - filteredData[1][i]
数据未在x轴上正确对齐,因此您减去不在x轴的同一点上的值。样条数据的时间戳向右移动,因为在univariate_spline_interpolation
函数中
startPt = dataX[1]
,而点数与输入x数据中的相同。该行可能应该更改为
startPt = dataX[0]
答案 1 :(得分:0)
我认为通过
计算差异splinedY [i] - filteredData [1] [i]
从根本上是有问题的。 'splinedY [i]'是在univariate_spline_interpolation()内从均匀间隔的X值(splinedDataX)计算的。 splinedDataX与'sensorTimestamp'不同;因此,比较它们相应的Y值是没有意义的。
关于如何在python中编码for循环的谷歌搜索后,我认为问题的罪魁祸首是在
的声明中对于我在splinedY:
此语句中的'i'将是数组元素的值,而不是索引。正确的语法是
表示范围内的i(len(splinedY)):