Question

我有以下数据：

import pandas as pd
from datetime import datetime

x = pd.Series([1, 2, 4], [datetime(2013,11,1), datetime(2013,11, 2), datetime(2013, 11, 4)])

11月3日缺少的索引对应零值，我希望它看起来像这样：

y = pd.Series([1,2,0,4], pd.date_range('2013-11-01', periods = 4))

将x转换为y的最佳方法是什么？我试过了

y = pd.Series(x, pd.date_range('2013-11-1', periods = 4)).fillna(0)

这会抛出一个索引错误，有时我无法解释（索引长度与值不匹配，即使索引和数据具有相同的长度。有更好的方法吗？

Answer 1

您可以使用pandas.Series.resample()：

>>> x.resample('D').fillna(0)
2013-11-01    1
2013-11-02    2
2013-11-03    0
2013-11-04    4

fill_method函数中有resample()个参数，但我不知道在重新采样期间是否可以使用它来替换{{1}}。但看起来您可以使用NaN方法来处理它，例如：

how

不知道哪种方法是首选方法。还请看@AndyHayden的答案 - >>> x.resample('D', how=lambda x: x.mean() if len(x) > 0 else 0) 2013-11-01 1 2013-11-02 2 2013-11-03 0 2013-11-04 4和reindex()可能是最有效的方法，但你必须自己做测试。

Answer 2

我想我会使用resample（请注意，如果默认情况下有平均值，请注意：）

In [11]: x.resample('D')  # you could use how='first'
Out[11]: 
2013-11-01     1
2013-11-02     2
2013-11-03   NaN
2013-11-04     4
Freq: D, dtype: float64

In [12]: x.resample('D').fillna(0)
Out[12]: 
2013-11-01    1
2013-11-02    2
2013-11-03    0
2013-11-04    4
Freq: D, dtype: float64

如果您喜欢加注，请使用reindex：

In [13]: x.reindex(pd.date_range('2013-11-1', periods=4), fill_value=0)
Out[13]: 
2013-11-01   1
2013-11-02   2
2013-11-03   0
2013-11-04   4
Freq: D, dtype: float64

填写熊猫中丢失的指数

2 个答案: