Question

我有一个气象站数据的csv文件，它有非连续的时间戳：

logstamp               temp     rh     snow      wind          gust    wind_dir                         
2018-01-26 21:00:00   -10.120  63.93  207.1     4.018         9.806   173.900   
2018-01-26 22:00:00    -9.750  58.54  207.0     3.856        11.149   158.500   
2018-01-26 23:00:00    -9.710  60.92  206.9     6.505        13.759   159.100   
2018-01-27 00:00:00   -10.110  57.45  206.7     6.602        12.488   167.700   
2018-01-28 13:00:00    -7.574  84.90  212.4     5.594        15.736   134.100   
2018-01-28 14:00:00    -4.347  88.20  213.1     5.663        15.242   170.700   
2018-01-28 15:00:00    -1.360  89.30  213.0     4.896        19.051   175.300

我想使用pandas reindex函数添加缺少时间戳的行，这样我就可以为缺失的时间插入数据。例如，我们的气象站在上表中删除了1月27日。

我尝试使用带有pandas的reindex函数。这会导致缺失时间的新插值行，但是它会将所有原始列数据转换为NaN。

ts1 = pd.read_csv("mtmaya-2018-02-20.csv", index_col='logstamp', infer_datetime_format='TRUE') 

index = pd.date_range(ts1.index.min(),ts1.index.max(), freq="H")
ts1 = ts1.reindex(index)

                     air temp  rh  snow  wind spd  wind spd max  wind dir  \
2018-01-25 14:00:00       NaN NaN   NaN       NaN           NaN       NaN   
2018-01-25 15:00:00       NaN NaN   NaN       NaN           NaN       NaN   
2018-01-25 16:00:00       NaN NaN   NaN       NaN           NaN       NaN

我认为我错过了什么。

Answer 1

我认为您需要致电interpolate。对于您的示例数据框：

# This is the same as what you had
index = pd.date_range(ts1.index.min(),ts1.index.max(), freq="H")

# Here, reindex as you had done before, but chain an 'interpolate' on top of that
ts1 = ts1.reindex(index).interpolate(method='time')

如果您打印前10列，您会看到已插入NaNs的{{1}}：

2018-01-27

Pandas重新索引将所有非索引列转换为NaN

1 个答案: