Question

interploate中的pandas方法使用有效数据来插值nan值。但是，它保持旧的有效数据不变，如下面的代码。

有没有办法使用interploate方法更改旧值，使系列变得平滑？

In [1]: %matplotlib inline
In [2]: from scipy.interpolate import UnivariateSpline as spl
In [3]: import numpy as np
In [4]: import pandas as pd
In [5]: samples = { 0.0: 0.0, 0.4: 0.5, 0.5: 0.9, 0.6: 0.7, 0.8:0.3, 1.0: 1.0 }
In [6]: x, y = zip(*sorted(samples.items()))

In [7]: df1 = pd.DataFrame(index=np.linspace(0, 1, 31), columns=['raw', 'itp'], dtype=float)

In [8]: df1.loc[x] = np.array(y)[:, None]
In [9]: df1['itp'].interpolate('spline', order=3, inplace=True)
In [10]: df1.plot(style={'itp': 'b-', 'raw': 'rs'}, figsize=(8, 6))

In [11]: df2 = pd.DataFrame(index=np.linspace(0, 1, 31), columns=['raw', 'itp'], dtype=float)
In [12]: df2.loc[x, 'raw'] = y
In [13]: f = spl(x, y, k=3)
In [14]: df2['itp'] = f(df2.index)
In [15]: df2.plot(style={'itp': 'b-', 'raw': 'rs'}, figsize=(8, 6))

Answer 1

当Series.interpolate与method='spline'一起使用时，Pandas uses interpolate.UnivariateSpline。

样条返回 UnivariateSpline 不保证通过输入unless s=0给出的数据点。但是，默认情况下为s=None，它使用不同的平滑因子，从而导致不同的结果。

Series.interpolate方法始终 fills in NaN values 不改变非NaN值。没有办法 Series.interpolate修改非NaN值。那么，当s != 0时，结果产生锯齿状的跳跃。

因此，如果您想要s=None（默认）样条插值但不需要锯齿状的跳跃，正如你已经发现的那样，你必须打电话给UnivariateSpline 直接覆盖df['itp']中的所有值：

df['itp'] = interpolate.UnivariateSpline(x, y, k=3)(df.index)

如果您想要通过所有非NaN数据点的三次样条曲线，那么使用s=0

df['itp'].interpolate('spline', order=3, s=0, inplace=True)

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import scipy.interpolate as interpolate

samples = { 0.0: 0.0, 0.4: 0.5, 0.5: 0.9, 0.6: 0.7, 0.8:0.3, 1.0: 1.0 }
x, y = zip(*sorted(samples.items()))

fig, ax = plt.subplots(nrows=3, sharex=True)
df1 = pd.DataFrame(index=np.linspace(0, 1, 31), columns=['raw', 'itp'], dtype=float)
df1.loc[x] = np.array(y)[:, None]

df2 = df1.copy()
df3 = df1.copy()

df1['itp'].interpolate('spline', order=3, inplace=True)
df2['itp'] = interpolate.UnivariateSpline(x, y, k=3)(df2.index)
df3['itp'].interpolate('spline', order=3, s=0, inplace=True)
for i, df in enumerate((df1, df2, df3)):
    df.plot(style={'itp': 'b-', 'raw': 'rs'}, figsize=(8, 6), ax=ax[i])
plt.show()

如何在pandas中使用`Series.interpolate`并修改旧值

1 个答案: