Question

尊敬的Stackoverflow社区，

我想对时间序列上的数据进行插值。我有以下数据点：

90      0
270     22.5
450     294
630     786
810     833.5
990     473.5
1170    60.375
1350    0

第一列是分钟（或x轴），第二列是我的数据（y轴）。

我想使用多项式函数对数据进行插值并找到与以下x轴匹配的值：

[60, 120, 180, 240, 300, 360, 420, 480, 540, 600, 660, 720, 780, 840, 900, 960, 1020, 1080, 1140, 1200, 1260, 1320, 1380, 1440]

。这些实际上是我已经转换为分钟的小时数：我的初始数据具有1h30（90），4h30（270）等数据，我想对数据进行插值并作为1h，2h等的输出一整天。

我最初使用过pandas.Series.interpolate（方法1）。我比较了不同的多项式订单，以便找到最合适的订单。对于所有阶数，所有多项式都会经过我的数据点。似乎多项式阶数2匹配得很好，所有其他的也都匹配，如图所示。

然后我将重新索引具有良好准确性的意甲到index_1H以获取我的数据。

但是，我只是将数据与Excel进行了比较，在Excel中，很明显2阶和3阶多项式甚至没有通过我的数据集。

然后，我使用了np.polyfit（方法2），它比Excel提供了类似的结果。将熊猫作为pd导入将numpy导入为np

# FIRST METHOD WITH pandas.Series.interpolate 

indexx = [90, 270, 450, 630, 810, 990, 1170, 1350]
dataa = [0, 22.5, 294, 786, 833.5, 473.5, 60.375, 0]
s = pd.Series(dataa,index=indexx)

index_30min = [60, 90, 120, 150, 180, 210, 240, 270, 300, 330, 360, 390, 420, 450, 480, 510, 540, 570, 600, 630, 660, 690, 720, 750, 780, 810, 840, 870, 900, 930, 960, 990, 1020, 1050, 1080, 1110, 1140, 1170, 1200, 1230, 1260, 1290, 1320, 1350, 1380, 1410, 1440]
index_1H = [60, 120, 180, 240, 300, 360, 420, 480, 540, 600, 660, 720, 780, 840, 900, 960, 1020, 1080, 1140, 1200, 1260, 1320, 1380, 1440]

s_1H = s.reindex(index_1H)
s_30min = s.reindex(index_30min)

s2 = s_30min.interpolate(method='polynomial', order=2)
s3 = s_30min.interpolate(method='cubic')
s4 = s_30min.interpolate(method='quadratic')
s5 = s_30min.interpolate(method='polynomial', order=5)
s7 = s_30min.interpolate(method='polynomial', order=7)

polynome = pd.concat([s2, s3, s4, s5, s7], axis=1)
polynome.columns = ["s2", "s3", "s4", "s5", "s7"]
polynome = polynome.assign(init=s)
polynome.plot()


# SECOND METHOD WITH NUMPY POLYFIT

x = np.array([90, 270, 450, 630, 810, 990, 1170, 1350])
y = np.array([0, 22.5, 294, 786, 833.5, 473.5, 60.375, 0])
z = np.polyfit(x, y, 5) # 5 is the order here

p = np.poly1d(z)

结果：对于method1，我得到下图：all polynome fits

对于method2，如果我绘制p（450），则阶数为7，得到294.0000000001205 如果我绘制p（450），则使用阶数2得到479.1227678571424。这些值与我在Excel中找到的值匹配。 Excel图形在这里：polynom order 2 and 3 do not fit at all

我想了解numpy.polyfit函数的确切功能以及pandas.Series.interpolate的功能。后者是scipy.interpolate.interp1d，但我仍然不清楚实际计算的是什么。

最重要的是，我想知道这两种方法中哪种是正确的！

非常感谢， Anaïs

pd.Series插值和np.polyfit给出的结果不同-为什么？

0 个答案: