我正在(尝试)自动化目前由三个产品经理和整个团队手动维护的一些KPI图表(数据来自SQL Server,然后转到Excel,然后是PPT,然后到显示监视器)。我可以将销售数据从SQL提取到Python,并能够在matplotlib中创建必要的协调图表。该操作每周执行一次,因此每个周期都是周一至周六的周转期(周日无销售)。
按产品经理和类别对销售进行标记,因此有时会显示随机销售而没有标签。我的问题是使用groupby,我只获得销售日,而不是整个日期范围。因此,当我尝试创建绘图时,会出现错误,因为该系列的输入数都不相同。
我已经花了三天时间独自解决这个问题,我终于放弃了。
现在,DataFrame代码如下所示 df = pd.DataFrame(SQL_Query,列= ['SO_DATE','SALES','PARENT_CATEGORY','PIC']),当前星期范围是4/1 / 2019-4 / 6/2019
我尝试了asfreq,reindex和以下内容的许多变体:
[In] : df['SO_DATE'] = pd.to_datetime(df['SO_DATE'])
df.set_index('SO_DATE').groupby(['PIC','PARENT_CATEGORY'], sort = False)['SALES'].resample('D').asfreq().fillna(0).reset_index()
[Out] :
PIC PARENT_CATEGORY SO_DATE SALES
0 Curly Spam 2019-04-01 23209.47
1 Curly Eggs 2019-04-02 67969.84
2 Curly SpamSpam 2019-04-03 19924.44
3 Curly EggsEggs 2019-04-04 17005.59
4 Curly EggsSpam 2019-04-06 328.06
5 Moe Spam 2019-04-01 11750.58
6 Moe Eggs 2019-04-02 12187.02
7 Moe SpamSpam 2019-04-03 5003.66
8 Moe EggsEggs 2019-04-04 6026.33
9 Moe SpamEggs 2019-04-05 10344.57
10 Moe EggsSpam 2019-04-06 1816.41
11 Larry Spam 2019-04-01 11489.23
12 Larry SpamSpam 2019-04-03 7915.24
13 Larry EggsEggs 2019-04-04 5993.43
14 Larry SpamEggs 2019-04-06 332.98
[In]:df[df.PIC.isnull()].groupby(['SO_DATE','SALES']).sum().unstack(fill_value=0).stack()
[Out] :
PARENT_CATEGORY PIC
SO_DATE SALES
2019-04-02 332.5 0 0
851.5 0 0
2727.2 0 0
2019-04-03 332.5 0 0
851.5 0 0
2727.2 0 0
2019-04-05 332.5 0 0
851.5 0 0
2727.2 0 0
[In]: nopm = df[df.PIC.isnull()]
nopm.set_index('SO_DATE')['SALES'].resample('D').asfreq().fillna(0).reset_index()
[Out]:
SO_DATE SALES
0 2019-04-02 851.5
1 2019-04-03 2727.2
2 2019-04-04 0.0
3 2019-04-05 332.5
[In] :
pd.DataFrame(index = new_index,columns ={'Curly':df[df['PIC']=='Curly'].groupby('SO_DATE')['SALES'].sum()})
[Out] :
Curly
2019-04-01 NaN
2019-04-02 NaN
2019-04-03 NaN
2019-04-04 NaN
2019-04-05 NaN
2019-04-06 NaN
我想要的是
PIC PARENT_CATEGORY SO_DATE SALES
0 Curly Spam 2019-04-01 23209.47
1 Curly Eggs 2019-04-02 67969.84
2 Curly SpamSpam 2019-04-03 19924.44
3 Curly EggsEggs 2019-04-04 17005.59
4 Curly SpamEggs 2019-04-05 0.00
5 Curly EggsSpam 2019-04-06 328.06
6 Moe Spam 2019-04-01 11750.58
7 Moe Eggs 2019-04-02 12187.02
8 Moe SpamSpam 2019-04-03 5003.66
9 Moe EggsEggs 2019-04-04 6026.33
10 Moe SpamEggs 2019-04-05 10344.57
11 Moe EggsSpam 2019-04-06 1816.41
12 Larry Spam 2019-04-01 11489.23
13 Larry Eggs 2019-04-02 0.00
14 Larry SpamSpam 2019-04-03 7915.24
15 Larry EggsEggs 2019-04-04 5993.43
16 Larry SpamEggs 2019-04-05 0.00
17 Larry EggsSpam 2019-04-06 332.98
18 NaN Spam 2019-04-01 0.00
19 NaN Eggs 2019-04-02 851.0
20 NaN SpamSpam 2019-04-03 2727.2
21 NaN EggsEggs 2019-04-04 0.00
22 NaN SpamEggs 2019-04-05 332.5
23 Nan EggsSpam 2019-04-06 0.00
Or
when doing a groupby for any PIC (including the NULL!) get this--
SALES
2019-04-01 0.0
2019-04-02 851.0
2019-04-03 2727.2
2019-04-04 0.0
2019-04-05 332.5
2019-04-06 0.0