我有一个气象站数据的csv文件,它有非连续的时间戳:
logstamp temp rh snow wind gust wind_dir
2018-01-26 21:00:00 -10.120 63.93 207.1 4.018 9.806 173.900
2018-01-26 22:00:00 -9.750 58.54 207.0 3.856 11.149 158.500
2018-01-26 23:00:00 -9.710 60.92 206.9 6.505 13.759 159.100
2018-01-27 00:00:00 -10.110 57.45 206.7 6.602 12.488 167.700
2018-01-28 13:00:00 -7.574 84.90 212.4 5.594 15.736 134.100
2018-01-28 14:00:00 -4.347 88.20 213.1 5.663 15.242 170.700
2018-01-28 15:00:00 -1.360 89.30 213.0 4.896 19.051 175.300
我想使用pandas reindex函数添加缺少时间戳的行,这样我就可以为缺失的时间插入数据。例如,我们的气象站在上表中删除了1月27日。
我尝试使用带有pandas的reindex函数。这会导致缺失时间的新插值行,但是它会将所有原始列数据转换为NaN。
ts1 = pd.read_csv("mtmaya-2018-02-20.csv", index_col='logstamp', infer_datetime_format='TRUE')
index = pd.date_range(ts1.index.min(),ts1.index.max(), freq="H")
ts1 = ts1.reindex(index)
air temp rh snow wind spd wind spd max wind dir \
2018-01-25 14:00:00 NaN NaN NaN NaN NaN NaN
2018-01-25 15:00:00 NaN NaN NaN NaN NaN NaN
2018-01-25 16:00:00 NaN NaN NaN NaN NaN NaN
我认为我错过了什么。
答案 0 :(得分:0)
我认为您需要致电interpolate
。对于您的示例数据框:
# This is the same as what you had
index = pd.date_range(ts1.index.min(),ts1.index.max(), freq="H")
# Here, reindex as you had done before, but chain an 'interpolate' on top of that
ts1 = ts1.reindex(index).interpolate(method='time')
如果您打印前10列,您会看到已插入NaNs
的{{1}}:
2018-01-27