Question

我正在读取存储在文件中的温度条目。温度值发生变化时会生成每个条目，因此不会定期存储。

数据的一个例子如下：

timestamp  | temperature
-----------+------------
1477400000 | 31
1477400001 | 31.5
1477400003 | 32
1477400010 | 31.5
1477400200 | 32
1477400201 | 32.5

我需要一种快速的方法来获取任何时间戳的温度，即使它不在索引中。例如，1477400002的温度为31.5，但1477400002不在索引中。

为了更容易再现，可以按如下方式生成相同的数据帧：

df = pd.DataFrame(data={'temperature': [31, 31.5, 32, 31.5, 32, 32.5]},
                  index=[1477400000, 1477400001, 1477400003, 1477400010, 1477400200, 1477400201])

Answer 1

假设索引已排序，您可以使用np.searchsorted返回序数位置并使用iloc索引到df：

In [84]:
df.iloc[max(0, np.searchsorted(df.index, 1477400002 ) -1)]

Out[84]:
temperature    31.5
Name: 1477400001, dtype: float64

这里我从np.searchsorted的结果中减去1来返回下限，另外为了防止它返回第一个条目的情况我也在max之间计算了0和返回的值如果您尝试查找1477400000，那么这仍将返回第一个条目

Answer 2

您还可以使用index.get_loc方法并将其设置为arg nearest=pad 如果未找到匹配项，则查找先前的索引值。然后，使用DF.get_value通过访问name属性和感兴趣的列，温度来检索上述操作指向的索引处的值，如下所示：

<强> 演示：

df.get_value(df.iloc[df.index.get_loc(1477400002, method='pad')].name, 'temperature')
# 31.5

df.get_value(df.iloc[df.index.get_loc(1477400003, method='pad')].name, 'temperature')
# 32.0

假设查询将在第一个索引之后开始，因为您希望在任何给定时间点使用先前的值。

<强> 时序：

%timeit df.get_value(df.iloc[df.index.get_loc(1477400002, method='pad')].name, 'temperature')
1000 loops, best of 3: 164 µs per loop

使用索引条目之间的索引值查询数据帧

2 个答案: