熊猫石斑鱼缺少时间步骤

时间:2019-09-02 06:59:19

标签: pandas pandas-groupby

我希望获得每个月的销售数量,即使该产品在某个时期内缺少销售数字也是如此。考虑以下示例:

import pandas as pd
import numpy as np
np.random.seed(42)

dates = pd.date_range('1/1/2001','31/12/2001', freq = 'd')
sales = [np.random.randint(100) for _ in range(len(dates))]
product = [['A', 'B', 'C'][np.random.randint(3)] for _ in range(len(dates))]

df = pd.DataFrame({'Dates': dates, 
                   'Sales': sales,
                   'Product': product
                  })
march = df.Dates.dt.month == 3
df = df[~march]

所有进行曲数据均被删除。我希望在打印时将这些销售额显示为零:

monthly = pd.Grouper(key='Dates', freq='M')
sum_sales = df.groupby(['Product', monthly])['Sales'].sum()

其中仅针对产品A的sum_sales如下(注意缺少3月时间步长):

Product  Dates     
A        2001-01-31    658
         2001-02-28    460
         2001-04-30    541
         2001-05-31    701
         2001-06-30    517
         2001-07-31    596
         2001-08-31    802
         2001-09-30    654
         2001-10-31    561
         2001-11-30    473
         2001-12-31    605

但是,如果我只做df.groupby(monthly)['Sales'].sum()而没有按产品分组,我将得到预期的零。

Dates
2001-01-31    1616
2001-02-28    1256
2001-03-31       0
2001-04-30    1555
2001-05-31    1384
2001-06-30    1451
2001-07-31    1677
2001-08-31    1472
2001-09-30    1535
2001-10-31    1316
2001-11-30    1573
2001-12-31    1403

因此,我想知道如何在groupby中使用多个事物时,如何将大熊猫显示为零销售来显示缺失的日期。

1 个答案:

答案 0 :(得分:2)

我认为您的解决方案应该可行,这似乎是错误的。

可能的解决方案是用resample代替Grouper链接两个操作:

sum_sales = df.set_index('Dates').groupby('Product').resample('M')['Sales'].sum()

print (sum_sales)
Product  Dates     
A        2001-01-31    658
         2001-02-28    460
         2001-03-31      0
         2001-04-30    541
         2001-05-31    701
         2001-06-30    517
         2001-07-31    596
         2001-08-31    802
         2001-09-30    654
         2001-10-31    561
         2001-11-30    473
         2001-12-31    605
B        2001-01-31    589
         2001-02-28    344
         2001-03-31      0
         2001-04-30    571
         2001-05-31    347
         2001-06-30    528
         2001-07-31    663
         2001-08-31    294
         2001-09-30    238
         2001-10-31    487
         2001-11-30    503
         2001-12-31    303
C        2001-01-31    369
         2001-02-28    452
         2001-03-31      0
         2001-04-30    443
         2001-05-31    336
         2001-06-30    406
         2001-07-31    418
         2001-08-31    376
         2001-09-30    643
         2001-10-31    268
         2001-11-30    597
         2001-12-31    495
Name: Sales, dtype: int64