我有一个数据框,其中包含来自多个超市的数据,其结构如下:
MARKET_ID SECTOR DATE HOUR REVENUE COUPONS ITEMS
328 21 Fruits 2019-02-24 15:00:00 808.60 19 29
329 21 Fruits 2019-02-24 22:00:00 267.54 8 8
330 21 Fruits 2019-02-26 17:00:00 350.89 10 14
331 21 Dairy 2019-02-26 07:00:00 72.89 2 2
332 21 Dairy 2019-03-03 15:00:00 122.69 4 4
一些注意事项:
[HOUR]
从“ 00:00:00”到“ 23:00:00”(每个日期24个条目)。
我的“复合键”将是[MARKET_ID]
的组合,
[SECTOR]
,[DATE]
和[HOUR]
,但我不在该数据框中使用MultiIndex。
没有销售的时间(收入,优惠券或物品)不会显示为 我收到的数据中的行。
我想在数据框中填充那些缺失的行,如下所示:
MARKET_ID SECTOR DATE HOUR REVENUE COUPONS ITEMS
328 21 Fruits 2019-02-24 14:00:00 0 0 0
在搜索时,我遇到了使用reindex
或grouper
的解决方案,但是我不确定这些解决方案是否适合我的问题。有什么建议吗?
感谢您的关注。
答案 0 :(得分:1)
创建日期和时间的组合列:
df['DATETIME'] = pd.to_datetime(df['DATE'] + ' ' + df['HOUR'])
丢弃冗余信息:
df.drop(['DATE','HOUR'], inplace=True, axis = 1)
现在按MARKET_ID
和SECTOR
分组,并在resample
的选项中使用H
,并用0填充缺失值:
df.groupby(['MARKET_ID', 'SECTOR']).\
apply(lambda x : x.set_index('DATETIME').resample('H').mean().fillna(0))
答案 1 :(得分:1)
您可以在此处使用resample
:
# df['DATE'] = pd.to_datetime(df['DATE'])
# df['HOUR'] = pd.to_timedelta(df['HOUR'])
grp = df.set_index(df['DATE']+df['HOUR']).groupby(['MARKET_ID', 'SECTOR'],
sort=False).resample('H').sum().reset_index(level=1)
SECTOR MARKET_ID REVENUE COUPONS ITEMS
MARKET_ID
21 2019-02-24 15:00:00 Fruits 21 808.60 19 29
2019-02-24 16:00:00 Fruits 0 0.00 0 0
2019-02-24 17:00:00 Fruits 0 0.00 0 0
2019-02-24 18:00:00 Fruits 0 0.00 0 0
2019-02-24 19:00:00 Fruits 0 0.00 0 0
2019-02-24 20:00:00 Fruits 0 0.00 0 0
2019-02-24 21:00:00 Fruits 0 0.00 0 0
2019-02-24 22:00:00 Fruits 21 267.54 8 8
2019-02-24 23:00:00 Fruits 0 0.00 0 0
2019-02-25 00:00:00 Fruits 0 0.00 0 0
2019-02-25 01:00:00 Fruits 0 0.00 0 0
2019-02-25 02:00:00 Fruits 0 0.00 0 0
2019-02-25 03:00:00 Fruits 0 0.00 0 0
2019-02-25 04:00:00 Fruits 0 0.00 0 0
2019-02-25 05:00:00 Fruits 0 0.00 0 0
2019-02-25 06:00:00 Fruits 0 0.00 0 0
2019-02-25 07:00:00 Fruits 0 0.00 0 0
2019-02-25 08:00:00 Fruits 0 0.00 0 0
2019-02-25 09:00:00 Fruits 0 0.00 0 0
2019-02-25 10:00:00 Fruits 0 0.00 0 0
2019-02-25 11:00:00 Fruits 0 0.00 0 0
2019-02-25 12:00:00 Fruits 0 0.00 0 0
2019-02-25 13:00:00 Fruits 0 0.00 0 0
2019-02-25 14:00:00 Fruits 0 0.00 0 0
2019-02-25 15:00:00 Fruits 0 0.00 0 0
2019-02-25 16:00:00 Fruits 0 0.00 0 0
2019-02-25 17:00:00 Fruits 0 0.00 0 0
2019-02-25 18:00:00 Fruits 0 0.00 0 0
2019-02-25 19:00:00 Fruits 0 0.00 0 0
2019-02-25 20:00:00 Fruits 0 0.00 0 0
2019-02-25 21:00:00 Fruits 0 0.00 0 0
2019-02-25 22:00:00 Fruits 0 0.00 0 0
2019-02-25 23:00:00 Fruits 0 0.00 0 0
2019-02-26 00:00:00 Fruits 0 0.00 0 0
2019-02-26 01:00:00 Fruits 0 0.00 0 0
2019-02-26 02:00:00 Fruits 0 0.00 0 0
2019-02-26 03:00:00 Fruits 0 0.00 0 0
2019-02-26 04:00:00 Fruits 0 0.00 0 0
2019-02-26 05:00:00 Fruits 0 0.00 0 0
2019-02-26 06:00:00 Fruits 0 0.00 0 0
2019-02-26 07:00:00 Fruits 0 0.00 0 0
2019-02-26 08:00:00 Fruits 0 0.00 0 0
2019-02-26 09:00:00 Fruits 0 0.00 0 0
2019-02-26 10:00:00 Fruits 0 0.00 0 0
2019-02-26 11:00:00 Fruits 0 0.00 0 0
2019-02-26 12:00:00 Fruits 0 0.00 0 0
2019-02-26 13:00:00 Fruits 0 0.00 0 0
2019-02-26 14:00:00 Fruits 0 0.00 0 0
2019-02-26 15:00:00 Fruits 0 0.00 0 0
2019-02-26 16:00:00 Fruits 0 0.00 0 0
... ... ... ... ... ...
2019-03-01 14:00:00 Dairy 0 0.00 0 0
2019-03-01 15:00:00 Dairy 0 0.00 0 0
2019-03-01 16:00:00 Dairy 0 0.00 0 0
2019-03-01 17:00:00 Dairy 0 0.00 0 0
2019-03-01 18:00:00 Dairy 0 0.00 0 0
2019-03-01 19:00:00 Dairy 0 0.00 0 0
2019-03-01 20:00:00 Dairy 0 0.00 0 0
2019-03-01 21:00:00 Dairy 0 0.00 0 0
2019-03-01 22:00:00 Dairy 0 0.00 0 0
2019-03-01 23:00:00 Dairy 0 0.00 0 0
2019-03-02 00:00:00 Dairy 0 0.00 0 0
2019-03-02 01:00:00 Dairy 0 0.00 0 0
2019-03-02 02:00:00 Dairy 0 0.00 0 0
2019-03-02 03:00:00 Dairy 0 0.00 0 0
2019-03-02 04:00:00 Dairy 0 0.00 0 0
2019-03-02 05:00:00 Dairy 0 0.00 0 0
2019-03-02 06:00:00 Dairy 0 0.00 0 0
2019-03-02 07:00:00 Dairy 0 0.00 0 0
2019-03-02 08:00:00 Dairy 0 0.00 0 0
2019-03-02 09:00:00 Dairy 0 0.00 0 0
2019-03-02 10:00:00 Dairy 0 0.00 0 0
2019-03-02 11:00:00 Dairy 0 0.00 0 0
2019-03-02 12:00:00 Dairy 0 0.00 0 0
2019-03-02 13:00:00 Dairy 0 0.00 0 0
2019-03-02 14:00:00 Dairy 0 0.00 0 0
2019-03-02 15:00:00 Dairy 0 0.00 0 0
2019-03-02 16:00:00 Dairy 0 0.00 0 0
2019-03-02 17:00:00 Dairy 0 0.00 0 0
2019-03-02 18:00:00 Dairy 0 0.00 0 0
2019-03-02 19:00:00 Dairy 0 0.00 0 0
2019-03-02 20:00:00 Dairy 0 0.00 0 0
2019-03-02 21:00:00 Dairy 0 0.00 0 0
2019-03-02 22:00:00 Dairy 0 0.00 0 0
2019-03-02 23:00:00 Dairy 0 0.00 0 0
2019-03-03 00:00:00 Dairy 0 0.00 0 0
2019-03-03 01:00:00 Dairy 0 0.00 0 0
2019-03-03 02:00:00 Dairy 0 0.00 0 0
2019-03-03 03:00:00 Dairy 0 0.00 0 0
2019-03-03 04:00:00 Dairy 0 0.00 0 0
2019-03-03 05:00:00 Dairy 0 0.00 0 0
2019-03-03 06:00:00 Dairy 0 0.00 0 0
2019-03-03 07:00:00 Dairy 0 0.00 0 0
2019-03-03 08:00:00 Dairy 0 0.00 0 0
2019-03-03 09:00:00 Dairy 0 0.00 0 0
2019-03-03 10:00:00 Dairy 0 0.00 0 0
2019-03-03 11:00:00 Dairy 0 0.00 0 0
2019-03-03 12:00:00 Dairy 0 0.00 0 0
2019-03-03 13:00:00 Dairy 0 0.00 0 0
2019-03-03 14:00:00 Dairy 0 0.00 0 0
2019-03-03 15:00:00 Dairy 21 122.69 4 4
[180 rows x 5 columns]