我有以下代码突出显示了poly1d并没有达到我的预期:
data_set = pd.DataFrame()
data_set['time_stamp'] = ['2018-10-03 21:59:00', '2018-10-04 04:11:00',
'2018-10-05 02:15:00', '2018-10-05 22:16:00',
'2018-10-07 12:23:00', '2018-10-08 22:07:00',
'2018-10-11 06:21:00', '2018-10-11 17:32:00',
'2018-10-11 23:45:00', '2018-10-13 09:37:00',
'2018-10-15 02:49:00', '2018-10-15 06:02:00',
'2018-10-15 19:01:00', '2018-10-16 05:49:00',
'2018-10-18 18:00:00', '2018-10-21 01:32:00',
'2018-10-23 10:54:00', '2018-10-28 17:57:00',
'2018-10-31 14:54:00']
data_set['value'] = [0.033567, 0.034284, 0.03351599, 0.034715, 0.033909,0.03463999, 0.031394,0.032193,0.030485,0.031977,0.030857, 0.03339099, 0.031096,0.032014,0.030989,0.03185099, 0.03107, 0.03178,0.030868 ]
data_set.set_index( pd.DatetimeIndex( data_set.time_stamp ), inplace=True)
fig = plt.figure(figsize=(16, 25))
ax = plt.subplot(211)
data_set['value'].plot()
y = [ 0.03463999, 0.032014, 0.031075]
x = [2, 7, 18]
poly = np.polyfit( x, y, 1 )
range_interested_in = range( min(x), max(x) )
line_x = np.poly1d( poly )( range_interested_in )
pdline = pd.DataFrame({
'time_stamp': data_set.time_stamp[ min(x) : max(x) ].values,
'value': line_x
})
pdline.set_index( pd.DatetimeIndex( pdline.time_stamp ), inplace=True)
pdline['value'].plot()
如果运行以下命令,则会得到一组线性线而不是一条线性线,但我无法弄清原因。
我的问题如下:我有一个数据集,我随机选择两个点。我使用poly1d通过这两个点绘制一条直线,并查看在该线的百分比阈值内还有多少其他点。对于此代码段,我对发现的数据集,时间戳和点进行了硬编码。
然后,我想重新进行poly1d处理,以使最合适的线在阈值范围内的所有点上更加准确,但索引不是连续的,这似乎会破坏