Question

我有一个看起来像

的熊猫系列

Attribute      DateEvent     Value
Type A         2015-04-01    4
               2015-04-02    5
               2015-04-05    3
Type B         2015-04-01    1
               2015-04-03    4
               2015-04-05    1

如何确保在我的DateEvent索引中考虑缺少日期（假设其开始日期和结束日期是全部范围？），如何将值转换为滚动金额（例如，过去两天）？（例如类型A缺少2015-04-03和2015-04-04，B类缺少2015-04-02和2015-04-04。

Answer 1

我对你想要的东西做了几个假设，请澄清：

您希望将缺少日期的行视为具有v.toString()。
因此，过去2天滚动金额应在滚动窗口中缺少日期的任何时候返回Value = NaN。
您想计算每个组中的滚动总和 NaN和Type A

如果我没假设，

创建样本数据集

Type B

这就是它的样子。 import pandas as pd import numpy as np import io datastring = io.StringIO( """ Attribute,DateEvent,Value Type A,2017-04-02,1 Type A,2017-04-03,2 Type A,2017-04-04,3 Type A,2017-04-05,4 Type B,2017-04-02,1 Type B,2017-04-03,2 Type B,2017-04-04,3 Type B,2017-04-05,4 """) s = pd.read_csv( datastring, index_col=['Attribute', 'DateEvent'], parse_dates=True) print(s)和Type A中的每一个都缺少Type B。

2017-04-01

解决方案

根据this answer，您必须重新构建索引，然后重新索引Value Attribute DateEvent Type A 2017-04-02 1 2017-04-03 2 2017-04-04 3 2017-04-05 4 Type B 2017-04-02 1 2017-04-03 2 2017-04-04 3 2017-04-05 4以获取包含所有日期的索引。

Series

使用# reconstruct index with all the dates dates = pd.date_range("2017-04-01","2017-04-05", freq="1D") attributes = ["Type A", "Type B"] # create a new MultiIndex index = pd.MultiIndex.from_product([attributes,dates], names=["Attribute","DateEvent"]) # reindex the series sNew = s.reindex(index)添加了缺少的日期。

Value = NaN

现在将Value Attribute DateEvent Type A 2017-04-01 NaN 2017-04-02 1.0 2017-04-03 2.0 2017-04-04 3.0 2017-04-05 4.0 Type B 2017-04-01 NaN 2017-04-02 1.0 2017-04-03 2.0 2017-04-04 3.0 2017-04-05 4.0分组到Series索引列，并应用大小为Attribute的滚动窗口2

sum()

最终输出

# group the series by the `Attribute` column
grouped = sNew.groupby(level="Attribute")
# Apply a 2 day rolling window
summed = grouped.rolling(2).sum()

最后注意：不知道为什么现在有两个Value Attribute Attribute DateEvent Type A Type A 2017-04-01 NaN 2017-04-02 NaN 2017-04-03 3.0 2017-04-04 5.0 2017-04-05 7.0 Type B Type B 2017-04-01 NaN 2017-04-02 NaN 2017-04-03 3.0 2017-04-04 5.0 2017-04-05 7.0索引列，如果有人想出来，请告诉我。

编辑：结果发出类似问题here。看看吧。

来源： How to fill in missing values with a multiIndex

熊猫日期MultiIndex与缺少日期 - 滚动总和

1 个答案:

创建样本数据集

解决方案

最终输出