问题

Question

问题

假设

quarters = numpy.arange(start=1947, stop=2017,       step=1/4 )
months   = numpy.arange(start=1947, stop=2016+10/12, step=1/12)

为什么会这样

months[3] < quarters[1]  # True

发生？我该如何避免呢？

上下文

我正在处理一些经济数据，我需要插入季度时间序列，以便人工获取月度数据。在下面的代码中，我假设：

import numpy as np
import scipy.interpolate as ip

所以我继续定义我的数据的时域：

quarters = np.arange(start=1947, stop=2017,       step=1/4 )
months   = np.arange(start=1947, stop=2016+10/12, step=1/12)

最初的季度时间序列确实从1947年第一季度开始（以yyyy-m-d格式标记为“1947-1-1”）到2016年最后一个季度（标记为“2016-10-1”）。快速检查确认两个域重合，因此months只比quarters更“密集”：

np.min(quarters) == np.min(months)  # True
np.max(quarters) == np.max(months)  # True

然后我转向真实的东西。我导入了一个时间序列，使用gdp将其命名为np.genfromtxt()，我确保我做对了，所以

gdp.shape == quarters.shape  # True

我对这些数据的第一个区别感兴趣：

dgdp = np.diff(gdp)
dgdp = np.concatenate(([np.nan], dpgdp))  # needed for consistency with the time-domain

我想插入每月的第一个差异：

interp_df = ip.interp1d(quarters[1:], dgdp[1:])

这很好用，interp_df确实是ip.interp1d()应该返回的目标类。

然而，只要我尝试获取插值数据

dgdp_mon = interp_df(months[3:])

Scipy抱怨ValueError: A value in x_new is below the interpolation range。通过调试和检查Scipy的源代码，结果发现问题在于子模块_check_bounds(self, x_new)中方法interpolate.py中的不等式检查，这基本上可以追溯到问题上面。

Answer 1

改为使用numpy.linspace：

quarters = numpy.linspace(start=1947, stop=2017, endpoint=False,num=(2017-1947)*4)
months   = numpy.linspace(start=1947, stop=2016+10./12, endpoint=False, num=(2016 - 1947)*12 + 10)

根据numpy.arange documentation：

使用非整数步骤（例如0.1）时，结果通常不会始终如一。最好在这些情况下使用linspace。

Answer 2

由浮点舍入错误引起。以整数生成数据，然后计算所需的值。

import numpy

quarters = numpy.arange(start=1947, stop=2017,       step=1/4 )
months   = numpy.arange(start=1947, stop=2016+10/12, step=1/12)

print([months[3], quarters[1]])
print(months[3] < quarters[1])
# [1947.2499999999998, 1947.25]
# True


quarters = numpy.arange(start=1947*4,  stop=2017*4,       step=1)/4
months   = numpy.arange(start=1947*12, stop=2016*12 + 10, step=1)/12

print([months[3], quarters[1]])
print(months[3] < quarters[1])
# [1947.25, 1947.25]
# False

错误的不等式检查numpy.arange（）

问题

上下文

2 个答案: