Question

我有两组数据，我想找到一个相关性。尽管存在相当多的数据分散，但显然存在一种关系。我目前使用numpy polyfit（8阶）但是有一些“摇摆”的线（特别是在开头和结尾）这是不合适的。其次，我不认为在线的开头非常好（曲线应该稍微陡峭。

如何通过这些数据点获得最佳拟合“样条线”？

我目前的代码：

# fit regression line
regressionLineOrder = 8
regressionLine = np.polyfit(data['x'], data['y'], regressionLineOrder)
p = np.poly1d(regressionLine)

Answer 1

看一下@ MatthewDrury对Why use regularisation in polynomial regression instead of lowering the degree?的回答。这简直太棒了。当他开始讨论使用自然三次样条来拟合回归代替阶数为10的正则化多项式时，最有趣的一点就出现了。你可以使用scipy.interpolate.CubicSpline的实现来完成非常相似的事情。对于类似的方法，scipy.interpolate中包含了大量用于其他样条方法的类。

这是一个简单的例子：

from scipy.interpolate import CubicSpline

cs = CubicSpline(data['x'], data['y'])
x_range = np.arange(x_min, x_max, some_step)
plt.plot(x_range, cs(x_range), label='Cubic Spline')

通过散射拟合样条曲线

1 个答案: