加入大熊猫时间序列与日期范围

时间:2017-05-18 18:30:50

标签: python pandas datetime time-series data-science

我有一个按日期和熊猫日期范围编制索引的熊猫系列。系列中的日期是日期范围的子集。熊猫必须有一种超级优雅的方式加入它们并用零填充缺失的值 - 但我现在想不到它。

pandas.date_range(start, end)

DatetimeIndex(['2017-04-20', '2017-04-21', '2017-04-22', '2017-04-23',
               '2017-04-24', '2017-04-25', '2017-04-26', '2017-04-27',
               '2017-04-28', '2017-04-29', '2017-04-30', '2017-05-01',
               '2017-05-02', '2017-05-03', '2017-05-04', '2017-05-05',
               '2017-05-06'],
              dtype='datetime64[ns]', freq='D')

data.groupby("Day").size()

Day
2017-04-20    462
2017-04-21     64
2017-04-22     13
2017-04-23      5
2017-04-24      9
2017-04-25      5
2017-04-26      1
2017-04-27      2
2017-04-30      1
2017-05-02      1
2017-05-04      1
2017-05-06      1
dtype: int64

期望的结果:

Day
2017-04-20    462
2017-04-21     64
2017-04-22     13
2017-04-23      5
2017-04-24      9
2017-04-25      5
2017-04-26      1
2017-04-27      2
2017-04-28      0
2017-04-29      0
2017-04-30      1
2017-05-02      1
2017-05-03      0
2017-05-04      1
2017-05-05      0
2017-05-06      1
dtype: int64

1 个答案:

答案 0 :(得分:2)

data.groupby("Day").size().reindex(pandas.date_range(start, end), fill_value=0)

演示

# I also named the new index :-)
data.groupby("Day").size().reindex(
    pd.date_range('2017-04-20', '2017-05-06', name='Day'), fill_value=0)

Day
2017-04-20    462
2017-04-21     64
2017-04-22     13
2017-04-23      5
2017-04-24      9
2017-04-25      5
2017-04-26      1
2017-04-27      2
2017-04-28      0
2017-04-29      0
2017-04-30      1
2017-05-01      0
2017-05-02      1
2017-05-03      0
2017-05-04      1
2017-05-05      0
2017-05-06      1
Freq: D, dtype: int64