我有一个数据框,代表着不同产品和不同商店的每日需求。
SKU Store F LeadTime Date Qty Value Price Level
0 504777 1 135828 11 2018-01-22 1 3.99 3.99 45
1 504777 1 135828 11 2018-01-23 0 0.00 0.00 45
2 504777 1 135828 11 2018-01-24 3 11.97 3.99 42
3 504777 1 135828 11 2018-01-25 1 3.99 3.99 41
4 504777 1 135828 11 2018-01-26 0 0.00 0.00 41
300 704777 2 135828 11 2018-01-22 1 4.99 3.99 45
301 704777 2 135828 11 2018-01-23 0 0.00 0.00 47
302 704777 2 135828 11 2018-01-24 4 12.97 3.99 48
303 704777 2 135828 11 2018-01-25 1 3.99 3.99 49
使用此示例,我要执行的操作是使用条件完成数据集,直到2018-01-31
:
SKU, Store, F, LeadTime, Date, Level
列应填写最后的值。
Qty, Value, Price
列应填写为0。
所以,我的预期输出应该是这样的:
SKU Store F LeadTime Date Qty Value Price Level
0 504777 1 135828 11 2018-01-22 1 3.99 3.99 45
1 504777 1 135828 11 2018-01-23 0 0.00 0.00 45
2 504777 1 135828 11 2018-01-24 3 11.97 3.99 42
3 504777 1 135828 11 2018-01-25 1 3.99 3.99 41
4 504777 1 135828 11 2018-01-26 1 3.99 3.99 41
5 504777 1 135828 11 2018-01-27 0 0.00 0.00 41
6 504777 1 135828 11 2018-01-28 0 0.00 0.00 41
7 504777 1 135828 11 2018-01-29 0 0.00 0.00 41
8 504777 1 135828 11 2018-01-30 0 0.00 0.00 41
9 504777 1 135828 11 2018-01-31 0 0.00 0.00 41
300 704777 2 135828 11 2018-01-22 1 4.99 3.99 45
301 704777 2 135828 11 2018-01-23 0 0.00 0.00 47
302 704777 2 135828 11 2018-01-24 4 12.97 3.99 48
303 704777 2 135828 11 2018-01-25 1 3.99 3.99 49
304 704777 2 135828 11 2018-01-26 0 0 0 49
305 704777 2 135828 11 2018-01-27 0 0 0 49
306 704777 2 135828 11 2018-01-28 0 0 0 49
307 704777 2 135828 11 2018-01-29 0 0 0 49
307 704777 2 135828 11 2018-01-30 0 0 0 49
307 704777 2 135828 11 2018-01-31 0 0 0 49
我尝试过:
df = df.set_index('Date').groupby(['SKU', 'Store']).date_range(end = '2018-01-31', freq='D').agg({
'F':'last',
'LeadTime':'last',
'Price':0,
'Value':0,
'Qty':0,
'Level':'last'}).reset_index()
但这不是正确的方法:
'DataFrameGroupBy' object has no attribute 'date_range'
PS:每种产品都有不同的开始日期
答案 0 :(得分:1)
在split
和SKU
上的第一分组。
同时,您可以创建一个date_range
,其中Store
为df的最大值,而start
为end
。
注意:我在这里使用列表理解来赢得速度上的胜利。
然后根据需要用2018-01-31
fillna
列。
最后concat
所有groupby数据帧并使用forwardfill (ffill)
:
0
dfs = [pd.concat([d, pd.DataFrame({'Date':pd.date_range(start=d['Date'].max(), end=pd.Timestamp(2018,1,31))})], ignore_index=True, sort=False) for _, d in df.groupby(['SKU', 'Store'])]
for df in dfs:
df[['Qty', 'Value', 'Price']] = df[['Qty', 'Value', 'Price']].fillna(0)
df = pd.concat(dfs, ignore_index=True, sort=False).ffill()
答案 1 :(得分:1)
我建议您每组尝试reindex
。然后创建一个列表来存储每个组,并从该列表中创建一个DataFrame
。
df['Date'] = pd.to_datetime(df['Date'])
dfs = []
for _, d in df.groupby(['SKU', 'Store']):
start_date = d.Date.iloc[0]
end_date = start_date + pd.offsets.MonthEnd()
d.set_index('Date', inplace=True)
d = d.reindex(pd.date_range(start_date, end_date))
d.fillna
dfs.append(d)
new_df = pd.concat(dfs)
new_df
SKU Store F LeadTime Qty Value Price Level
2018-01-22 504777.0 1.0 135828.0 11.0 1.0 3.99 3.99 45.0
2018-01-23 504777.0 1.0 135828.0 11.0 0.0 0.00 0.00 45.0
2018-01-24 504777.0 1.0 135828.0 11.0 3.0 11.97 3.99 42.0
2018-01-25 504777.0 1.0 135828.0 11.0 1.0 3.99 3.99 41.0
2018-01-26 504777.0 1.0 135828.0 11.0 0.0 0.00 0.00 41.0
2018-01-27 NaN NaN NaN NaN NaN NaN NaN NaN
2018-01-28 NaN NaN NaN NaN NaN NaN NaN NaN
2018-01-29 NaN NaN NaN NaN NaN NaN NaN NaN
2018-01-30 NaN NaN NaN NaN NaN NaN NaN NaN
2018-01-31 NaN NaN NaN NaN NaN NaN NaN NaN
2018-01-22 704777.0 2.0 135828.0 11.0 1.0 4.99 3.99 45.0
2018-01-23 704777.0 2.0 135828.0 11.0 0.0 0.00 0.00 47.0
2018-01-24 704777.0 2.0 135828.0 11.0 4.0 12.97 3.99 48.0
2018-01-25 704777.0 2.0 135828.0 11.0 1.0 3.99 3.99 49.0
2018-01-26 NaN NaN NaN NaN NaN NaN NaN NaN
2018-01-27 NaN NaN NaN NaN NaN NaN NaN NaN
2018-01-28 NaN NaN NaN NaN NaN NaN NaN NaN
2018-01-29 NaN NaN NaN NaN NaN NaN NaN NaN
2018-01-30 NaN NaN NaN NaN NaN NaN NaN NaN
2018-01-31 NaN NaN NaN NaN NaN NaN NaN NaN
然后使用ffill
来填充NaN
。
new_df = pd.concat(dfs)
new_df[['Price', 'Qty', 'Value']] = new_df[['Price', 'Qty', 'Value']].fillna(0)
new_df.ffill(inplace=True)
new_df
Out[17]:
SKU Store F LeadTime Qty Value Price Level
2018-01-22 504777.0 1.0 135828.0 11.0 1.0 3.99 3.99 45.0
2018-01-23 504777.0 1.0 135828.0 11.0 0.0 0.00 0.00 45.0
2018-01-24 504777.0 1.0 135828.0 11.0 3.0 11.97 3.99 42.0
2018-01-25 504777.0 1.0 135828.0 11.0 1.0 3.99 3.99 41.0
2018-01-26 504777.0 1.0 135828.0 11.0 0.0 0.00 0.00 41.0
2018-01-27 504777.0 1.0 135828.0 11.0 0.0 0.00 0.00 41.0
2018-01-28 504777.0 1.0 135828.0 11.0 0.0 0.00 0.00 41.0
2018-01-29 504777.0 1.0 135828.0 11.0 0.0 0.00 0.00 41.0
2018-01-30 504777.0 1.0 135828.0 11.0 0.0 0.00 0.00 41.0
2018-01-31 504777.0 1.0 135828.0 11.0 0.0 0.00 0.00 41.0
2018-01-22 704777.0 2.0 135828.0 11.0 1.0 4.99 3.99 45.0
2018-01-23 704777.0 2.0 135828.0 11.0 0.0 0.00 0.00 47.0
2018-01-24 704777.0 2.0 135828.0 11.0 4.0 12.97 3.99 48.0
2018-01-25 704777.0 2.0 135828.0 11.0 1.0 3.99 3.99 49.0
2018-01-26 704777.0 2.0 135828.0 11.0 0.0 0.00 0.00 49.0
2018-01-27 704777.0 2.0 135828.0 11.0 0.0 0.00 0.00 49.0
2018-01-28 704777.0 2.0 135828.0 11.0 0.0 0.00 0.00 49.0
2018-01-29 704777.0 2.0 135828.0 11.0 0.0 0.00 0.00 49.0
2018-01-30 704777.0 2.0 135828.0 11.0 0.0 0.00 0.00 49.0
2018-01-31 704777.0 2.0 135828.0 11.0 0.0 0.00 0.00 49.0