填充所有日期时间列,直到特定日期

时间:2019-04-30 17:38:04

标签: python pandas datetime

我有一个数据框,代表着不同产品和不同商店的每日需求。

     SKU    Store    F  LeadTime    Date    Qty Value   Price   Level   
0   504777      1   135828  11  2018-01-22  1   3.99    3.99    45  
1   504777      1   135828  11  2018-01-23  0   0.00    0.00    45  
2   504777      1   135828  11  2018-01-24  3   11.97   3.99    42  
3   504777      1   135828  11  2018-01-25  1   3.99    3.99    41  
4   504777      1   135828  11  2018-01-26  0   0.00    0.00    41  


300 704777      2   135828  11  2018-01-22  1   4.99    3.99    45  
301 704777      2   135828  11  2018-01-23  0   0.00    0.00    47  
302 704777      2   135828  11  2018-01-24  4   12.97   3.99    48  
303 704777      2   135828  11  2018-01-25  1   3.99    3.99    49  

使用此示例,我要执行的操作是使用条件完成数据集,直到2018-01-31

  • SKU, Store, F, LeadTime, Date, Level列应填写最后的值。

  • Qty, Value, Price列应填写为0。

所以,我的预期输出应该是这样的:

     SKU    Store    F  LeadTime    Date    Qty Value   Price   Level   
0   504777      1   135828  11  2018-01-22  1   3.99    3.99    45  
1   504777      1   135828  11  2018-01-23  0   0.00    0.00    45  
2   504777      1   135828  11  2018-01-24  3   11.97   3.99    42  
3   504777      1   135828  11  2018-01-25  1   3.99    3.99    41  
4   504777      1   135828  11  2018-01-26  1   3.99    3.99   41  
5   504777      1   135828  11  2018-01-27  0   0.00    0.00    41  
6   504777      1   135828  11  2018-01-28  0   0.00    0.00    41  
7   504777      1   135828  11  2018-01-29  0   0.00    0.00    41                                                                
8   504777      1   135828  11  2018-01-30  0   0.00    0.00    41  
9   504777      1   135828  11  2018-01-31  0   0.00    0.00    41  

300 704777      2   135828  11  2018-01-22  1   4.99    3.99    45  
301 704777      2   135828  11  2018-01-23  0   0.00    0.00    47  
302 704777      2   135828  11  2018-01-24  4   12.97   3.99    48  
303 704777      2   135828  11  2018-01-25  1   3.99    3.99    49
304 704777      2   135828  11  2018-01-26  0    0       0       49  
305 704777      2   135828  11  2018-01-27  0    0       0      49
306 704777      2   135828  11  2018-01-28  0    0       0      49  
307 704777      2   135828  11  2018-01-29  0    0       0      49  
307 704777      2   135828  11  2018-01-30  0    0       0      49  
307 704777      2   135828  11  2018-01-31  0    0       0      49  

我尝试过:

df = df.set_index('Date').groupby(['SKU', 'Store']).date_range(end = '2018-01-31', freq='D').agg({
                                             'F':'last',
                                             'LeadTime':'last',
                                             'Price':0,
                                             'Value':0,
                                             'Qty':0,
                                             'Level':'last'}).reset_index()

但这不是正确的方法:

'DataFrameGroupBy' object has no attribute 'date_range'

PS:每种产品都有不同的开始日期

2 个答案:

答案 0 :(得分:1)

splitSKU上的第一分组。

同时,您可以创建一个date_range,其中Store为df的最大值,而startend

注意:我在这里使用列表理解来赢得速度上的胜利。

然后根据需要用2018-01-31 fillna列。

最后concat所有groupby数据帧并使用forwardfill (ffill)

0

dfs = [pd.concat([d, pd.DataFrame({'Date':pd.date_range(start=d['Date'].max(), end=pd.Timestamp(2018,1,31))})], ignore_index=True, sort=False) for _, d in df.groupby(['SKU', 'Store'])]

for df in dfs:
    df[['Qty', 'Value', 'Price']] = df[['Qty', 'Value', 'Price']].fillna(0)

df = pd.concat(dfs, ignore_index=True, sort=False).ffill()

答案 1 :(得分:1)

我建议您每组尝试reindex。然后创建一个列表来存储每个组,并从该列表中创建一个DataFrame

df['Date'] = pd.to_datetime(df['Date'])

dfs = []
for _, d in df.groupby(['SKU', 'Store']):

    start_date = d.Date.iloc[0]
    end_date = start_date + pd.offsets.MonthEnd()

    d.set_index('Date', inplace=True)
    d = d.reindex(pd.date_range(start_date, end_date))
    d.fillna

    dfs.append(d)

new_df = pd.concat(dfs)

new_df

                 SKU  Store         F  LeadTime  Qty  Value  Price  Level
2018-01-22  504777.0    1.0  135828.0      11.0  1.0   3.99   3.99   45.0
2018-01-23  504777.0    1.0  135828.0      11.0  0.0   0.00   0.00   45.0
2018-01-24  504777.0    1.0  135828.0      11.0  3.0  11.97   3.99   42.0
2018-01-25  504777.0    1.0  135828.0      11.0  1.0   3.99   3.99   41.0
2018-01-26  504777.0    1.0  135828.0      11.0  0.0   0.00   0.00   41.0
2018-01-27       NaN    NaN       NaN       NaN  NaN    NaN    NaN    NaN
2018-01-28       NaN    NaN       NaN       NaN  NaN    NaN    NaN    NaN
2018-01-29       NaN    NaN       NaN       NaN  NaN    NaN    NaN    NaN
2018-01-30       NaN    NaN       NaN       NaN  NaN    NaN    NaN    NaN
2018-01-31       NaN    NaN       NaN       NaN  NaN    NaN    NaN    NaN
2018-01-22  704777.0    2.0  135828.0      11.0  1.0   4.99   3.99   45.0
2018-01-23  704777.0    2.0  135828.0      11.0  0.0   0.00   0.00   47.0
2018-01-24  704777.0    2.0  135828.0      11.0  4.0  12.97   3.99   48.0
2018-01-25  704777.0    2.0  135828.0      11.0  1.0   3.99   3.99   49.0
2018-01-26       NaN    NaN       NaN       NaN  NaN    NaN    NaN    NaN
2018-01-27       NaN    NaN       NaN       NaN  NaN    NaN    NaN    NaN
2018-01-28       NaN    NaN       NaN       NaN  NaN    NaN    NaN    NaN
2018-01-29       NaN    NaN       NaN       NaN  NaN    NaN    NaN    NaN
2018-01-30       NaN    NaN       NaN       NaN  NaN    NaN    NaN    NaN
2018-01-31       NaN    NaN       NaN       NaN  NaN    NaN    NaN    NaN

然后使用ffill来填充NaN

new_df = pd.concat(dfs)
new_df[['Price', 'Qty', 'Value']] = new_df[['Price', 'Qty', 'Value']].fillna(0)
new_df.ffill(inplace=True)
new_df
Out[17]: 
                 SKU  Store         F  LeadTime  Qty  Value  Price  Level
2018-01-22  504777.0    1.0  135828.0      11.0  1.0   3.99   3.99   45.0
2018-01-23  504777.0    1.0  135828.0      11.0  0.0   0.00   0.00   45.0
2018-01-24  504777.0    1.0  135828.0      11.0  3.0  11.97   3.99   42.0
2018-01-25  504777.0    1.0  135828.0      11.0  1.0   3.99   3.99   41.0
2018-01-26  504777.0    1.0  135828.0      11.0  0.0   0.00   0.00   41.0
2018-01-27  504777.0    1.0  135828.0      11.0  0.0   0.00   0.00   41.0
2018-01-28  504777.0    1.0  135828.0      11.0  0.0   0.00   0.00   41.0
2018-01-29  504777.0    1.0  135828.0      11.0  0.0   0.00   0.00   41.0
2018-01-30  504777.0    1.0  135828.0      11.0  0.0   0.00   0.00   41.0
2018-01-31  504777.0    1.0  135828.0      11.0  0.0   0.00   0.00   41.0
2018-01-22  704777.0    2.0  135828.0      11.0  1.0   4.99   3.99   45.0
2018-01-23  704777.0    2.0  135828.0      11.0  0.0   0.00   0.00   47.0
2018-01-24  704777.0    2.0  135828.0      11.0  4.0  12.97   3.99   48.0
2018-01-25  704777.0    2.0  135828.0      11.0  1.0   3.99   3.99   49.0
2018-01-26  704777.0    2.0  135828.0      11.0  0.0   0.00   0.00   49.0
2018-01-27  704777.0    2.0  135828.0      11.0  0.0   0.00   0.00   49.0
2018-01-28  704777.0    2.0  135828.0      11.0  0.0   0.00   0.00   49.0
2018-01-29  704777.0    2.0  135828.0      11.0  0.0   0.00   0.00   49.0
2018-01-30  704777.0    2.0  135828.0      11.0  0.0   0.00   0.00   49.0
2018-01-31  704777.0    2.0  135828.0      11.0  0.0   0.00   0.00   49.0