Question

在熊猫中，我正在创建一个数据框，例如：

    df = pd.read_csv(file_path)[['timestamp', 'close']]
    df['close'] = df['close'].astype(float)
    df = df.set_index('timestamp')

数据如下：

                    close
timestamp                 
2019-04-18          203.86
2019-04-17          203.13
2019-04-16          199.25
2019-04-15          199.23
2019-04-12          198.87

现在，我想用最近的邻居的线性插值来填充丢失的timestamp和close值。

我使用以下方法创建了缺少日期的列表：

dates = pd.date_range(start=df['timestamp'].min(), end=df['timestamp'].max())

然后重新编制索引：

df = df.reindex(dates).iloc[::-1]

但这产生了：

                      close
timestamp
2019-04-18             NaN
2019-04-17             NaN
2019-04-16             NaN
2019-04-15             NaN
2019-04-14             NaN

我希望这至少可以复制以前的值（尽管我还没有找到一种很好的方法来平稳地处理丢失的接近值插值。）我要如何用熊猫表达这句话？

Answer 1

如果您仅使用read_csv并且未通过parse_dates

，则您的索引不应为日期时间格式

df = df.set_index('timestamp')
df.index=pd.to_datetime(df.index)

转换后，您可以使用reindex

另一个解决方案是

df = pd.read_csv(file_path,parse_dates = 'timestamp')[['timestamp', 'close']]

完成日期时间的转换后，

我们使用interpolate fillna

df.loc[dates[::-1]].interpolate('index')

Answer 2

尝试：

df['timestamp'] = pd.to_datetime(df['timestamp'])
df = df.set_index('timestamp')
df.resample('D').interpolate(method='index')

输出：

             close
timestamp         
2019-04-12  198.87
2019-04-13  198.99
2019-04-14  199.11
2019-04-15  199.23
2019-04-16  199.25
2019-04-17  203.13
2019-04-18  203.86

Answer 3

尝试一下：

df.reindex(dates).align(df)[1]

输出：

+-------------+--------+
|             | close  |
+-------------+--------+
| 2019-04-12  | 198.87 |
| 2019-04-13  | NaN    |
| 2019-04-14  | NaN    |
| 2019-04-15  | 199.23 |
| 2019-04-16  | 199.25 |
| 2019-04-17  | 203.13 |
| 2019-04-18  | 203.86 |
+-------------+--------+

重新索引熊猫中缺少的日期，但收到NaN值

3 个答案: