Question

我希望能够使用我销售的每种产品的最近12个月的数据来计算销售预测。

我的时间序列数据包含该月的产品名称，月份和购买数量。但是，有几个月没有销售，那个月没有数据。

我的数据框如下所示：

2014-06  product1  100
2014-07  product1  50
2014-10  product1  120

但我希望它看起来像这样：

2014-06  product1  100
2014-07  product1  50
2014-08  product1  
2014-09  product1  
2014-10  product1  120

每个月都有一行，而不仅仅是有数据的月份。添加给定月份没有销售数据的行的最有效方法是什么？

Answer 1

您可以在使用Dataframe.reindex和PeriodIndex构建缺少月份的新pd.date_range后使用to_period()。首先，我会重新创建您的数据，将您的月份转换为Period的实例：

index = pd.to_datetime(['2014-06', '2014-07', '2014-10']).to_period('M')
data = pd.DataFrame({
        'name': 'product1',
        'count': [100, 50, 120]
    }, index=index)

现在我们创建一个包含范围内所有月份的新索引：

new_index = pd.date_range(
    start=index[0].to_timestamp(how='end'),
    end=index[-1].to_timestamp(how='end'),
    freq='M').to_period()

这看起来像：

>>> new_index
PeriodIndex(['2014-06', '2014-07', '2014-08', '2014-09', '2014-10'],
            dtype='int64', freq='M')

这样：

>>> res = data.reindex(new_index, method='backfill')
>>> res

         count      name
2014-06    100  product1
2014-07     50  product1
2014-08    120  product1
2014-09    120  product1
2014-10    120  product1

您会注意到name和count都已回填，而您只希望回填name。我们可以将新行的count设置为NaN，如下所示：

ix = new_index.difference(index)
res.loc[ix, 'count'] = None

那样：

>>> res

         count      name
2014-06    100  product1
2014-07     50  product1
2014-08    NaN  product1
2014-09    NaN  product1
2014-10    120  product1

（熊猫）处理缺少几个月的月度数据

1 个答案: