使用Pandas DataFrame在每日时间序列中填补漏洞

时间:2019-05-09 15:29:59

标签: python pandas dataframe

我有一个存储在CSV中的时间序列,并将其转储到DataFrame中,看起来像这样

                         time station_id station_name value
0   2019-05-08 00:10:00+00:00    9018823     XXXXXXXX    11
1   2019-05-08 00:20:00+00:00    9018823     XXXXXXXX    10
2   2019-05-08 00:30:00+00:00    9018823     XXXXXXXX     9
3   2019-05-08 00:40:00+00:00    9018823     XXXXXXXX     9
4   2019-05-08 00:50:00+00:00    9018823     XXXXXXXX     9

我正在使用Pandasto填补白天缺少的空缺,我只想从2019-05-08 00:00:00+00:002019-05-08 23:50:00+00:00每天做。我用以下内容填补了空白,但我无法填补00:00上缺少的内容。

data = data.set_index(keys=['time']).resample('10min', fill_method='ffill')

这是我可以用熊猫做的事吗?

更新

按照reindex的建议进行尝试,我得到了整个时间范围,但所得的DataFrame的值均具有NaN。

date_str = data['time'].iloc[0].strftime('%Y-%m-%d')
time_range = pd.date_range(date_str, date_str + ' 23:59:00', freq='10T')

data = (data.set_index(keys=['time'])
            .resample('10min').ffill()
            .reindex(time_range).bfill())
                     station_id  station_name  value
2019-05-08 00:00:00         NaN           NaN    NaN
2019-05-08 00:10:00         NaN           NaN    NaN
2019-05-08 00:20:00         NaN           NaN    NaN
2019-05-08 00:30:00         NaN           NaN    NaN
2019-05-08 00:40:00         NaN           NaN    NaN
2019-05-08 00:50:00         NaN           NaN    NaN
2019-05-08 01:00:00         NaN           NaN    NaN
2019-05-08 01:10:00         NaN           NaN    NaN
2019-05-08 01:20:00         NaN           NaN    NaN
2019-05-08 01:30:00         NaN           NaN    NaN
2019-05-08 01:40:00         NaN           NaN    NaN
2019-05-08 01:50:00         NaN           NaN    NaN

2 个答案:

答案 0 :(得分:0)

尝试reindex

# day of data
date_str = data['time'].iloc[0].strftime('%Y-%m-%d')
time_range = pd.date_range(date_str, date_str + ' 23:59:00', freq='10T')

data = (data.set_index(keys=['time'])
            .resample('10min', fill_method='ffill')
            .reindex(time_range).bfill())

答案 1 :(得分:0)

功能 interpolate有几种不同的填充方法和说明,也许可以尝试一下吗?

date_range = pd.date_range(firstDate, lastDate, freq='10Min')

df = df.reindex( date_range, fill_value=np.NaN)
df = df.interpolate(method='pad', limit_direction='forward', axis=1)