有没有一种简单的方法可以平滑曲线而无需考虑将来的值并且没有时间偏移?

时间:2019-09-05 20:47:12

标签: python pandas numpy scipy smoothing

我有一个Unix时间序列(x),它具有关联的信号值(y),该信号值每分钟生成一次,删除第一个值并附加一个新值。我正在尝试平滑结果曲线,而又不失去时间准确性,特别强调要写入数据库的平滑曲线的最终值。我希望能够在很大程度上调整平滑度。

我研究了(我或多或少是数学上的外行)我能找到并能掌握的所有选择。我遇到了Savitzki Golay,它看起来很完美,直到我意识到它可以很好地处理过去的数据,但是如果没有将来的数据可用于平滑,则无法产生可靠的最终值。我尝试了许多其他方法,但均能获得结果,但无法像Savgol一样进行调整。

import pandas as pd
from bokeh.plotting import figure, show, output_file
from bokeh.layouts import column
from math import pi
from scipy.signal import savgol_filter
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
from scipy.interpolate import splrep, splev
from scipy.ndimage import gaussian_filter1d
from scipy.signal import lfilter
from scipy.interpolate import UnivariateSpline
import matplotlib.pyplot as plt

df_sim = pd.read_csv("/home/20190905_Signal_Smooth_Test.csv")

#sklearn Polynomial*****************************************
poly = PolynomialFeatures(degree=4)
X = df_sim.iloc[:, 0:1].values
print(X)
y = df_sim.iloc[:, 1].values
print(y)
X_poly = poly.fit_transform(X)
poly.fit(X_poly, y)
lin2 = LinearRegression()
lin2.fit(X_poly, y)
# Visualising the Polynomial Regression results
plt.scatter(X, y, color='blue')
plt.plot(X, lin2.predict(poly.fit_transform(X)), color='red')
plt.title('Polynomial Regression')
plt.xlabel('Time')
plt.ylabel('Signal')
plt.show()

#scipy interpolate********************************************
bspl = splrep(df_sim['timestamp'], df_sim['signal'], s=5)
bspl_y = splev(df_sim['timestamp'], bspl)
df_sim['signal_spline'] = bspl_y

#scipy gaussian filter****************************************
smooth = gaussian_filter1d(df_sim['signal'], 3)
df_sim['signal_gauss'] = smooth

#scipy lfilter************************************************
n = 5  # the larger n is, the smoother curve will be
b = [1.0 / n] * n
a = 1
histo_filter = lfilter(b, a, df_sim['signal'])
df_sim['signal_lfilter'] = histo_filter
print(df_sim)

#scipy UnivariateSpline**************************************
s = UnivariateSpline(df_sim['timestamp'], df_sim['signal'], s=5)

xs = df_sim['timestamp']
ys = s(xs)
df_sim['signal_univariante'] = ys

#scipy savgol filter**************************************** 
sg = savgol_filter(df_sim['signal'], 11, 3)
df_sim['signal_savgol'] = sg

df_sim['date'] = pd.to_datetime(df_sim['timestamp'], unit='s')

#plotting it all********************************************
print(df_sim)
w = 60000
TOOLS = "pan,wheel_zoom,box_zoom,reset,save"
p = figure(x_axis_type="datetime", tools=TOOLS, plot_width=1000, plot_height=250,
            title=f"Various Signals y vs Timestamp x")
p.xaxis.major_label_orientation = pi / 4
p.grid.grid_line_alpha = 0.9
p.line(x=df_sim['date'], y=df_sim['signal'], color='green')
p.line(x=df_sim['date'], y=df_sim['signal_spline'], color='blue')
p.line(x=df_sim['date'], y=df_sim['signal_gauss'], color='red')
p.line(x=df_sim['date'], y=df_sim['signal_lfilter'], color='magenta')
p.line(x=df_sim['date'], y=df_sim['signal_univariante'], color='yellow')

p1 = figure(x_axis_type="datetime", tools=TOOLS, plot_width=1000, plot_height=250,
            title=f"Savgol vs Signal")
p1.xaxis.major_label_orientation = pi / 4
p1.grid.grid_line_alpha = 0.9
p1.line(x=df_sim['date'], y=df_sim['signal'], color='green')
p1.line(x=df_sim['date'], y=df_sim['signal_savgol'], color='blue')

output_file("signal.html", title="Signal Test")
show(column(p, p1))  # open a browser

我希望得到的结果与Savitzky Golay类似,但具有针对数据系列的有效最终平滑值。没有其他方法可以提供相同的灵活性来调整平滑度。大多数其他方法将曲线向右移动。我可以提供给csv文件进行测试。

1 个答案:

答案 0 :(得分:0)

这实际上取决于您为何要平滑数据。每种平滑方法都会产生副作用,例如让某些“噪声”比其他噪声更多。研究“滤波的相位响应”。

一种避免对称过滤器末尾数据丢失问题的常用技术是仅预测您的数据提前数点并使用它。例如,如果使用5项移动平均滤波器,则在计算最终值时将丢失2个数据点。

要预测这两点,您可以使用auto_arima()模块中的pmdarima函数,或者查看fbprophet模块(我发现这种情况非常有用)