我试图为我们拥有的数据制作一些情节和统计数据。当我在范围的开头和结尾处的日期值为零时,下面的代码失败了。例如,如果02/10/17的值为零,而03/10/17的实际值,则重采样不包括02/10/17日期。这会影响平均值,我希望它们显示在图上。
Gamesdf = df[df['Product Group'] == 'Video Games']
fig = plt.figure()
ax2 = fig.add_subplot(111)
ax22 = ax2.twinx()
s1 = Gamesdf.resample('D', on='Created').size()
s2 = Gamesdf.groupby('Created')['Machine Count'].first()
s = s1/s2
s1.plot(kind='bar', ax=ax2, position=0, label='Sales Total', width=0.25)
s.plot(kind='bar', ax=ax22, color='red', position=1, label='Adjusted For Machine Count', width=0.25)
ax2.set_ylim(0,80)
ax22.set_ylim(0,80)
plt.legend(loc='upper left')
ticklabels = s.index.strftime('%Y-%m-%d')
ax.xaxis.set_major_formatter(ticker.FixedFormatter(ticklabels))
plt.show()
print('Average Deposited Per Day:', s1.mean())
print('Average Deposited Per Day Per Machine:', s.mean())
如何应用我原始数据框中找到的日期范围-df - 并强制它进入s1和s2系列?
[![在此处输入图像说明] [1]] [1]
此示例应于03/10/17至30/10/17
运行数据样本:
df.sample(25).sort_values('Created')
Created Product Group Machine Count
51 2017-10-09 Wireless 3
191 2017-10-14 DVDs & Blu-rays 3
87 2017-10-14 DVDs & Blu-rays 3
74 2017-10-14 DVDs & Blu-rays 3
152 2017-10-14 DVDs & Blu-rays 3
243 2017-10-17 DVDs & Blu-rays 3
255 2017-10-17 DVDs & Blu-rays 3
334 2017-10-18 DVDs & Blu-rays 6
419 2017-10-21 DVDs & Blu-rays 11
478 2017-10-21 Video Games 11
499 2017-10-21 Video Games 11
502 2017-10-21 Music 11
371 2017-10-21 DVDs & Blu-rays 11
610 2017-10-22 Video Games 11
675 2017-10-23 DVDs & Blu-rays 11
766 2017-10-23 Video Games 11
738 2017-10-23 DVDs & Blu-rays 11
866 2017-10-25 DVDs & Blu-rays 11
871 2017-10-25 DVDs & Blu-rays 11
806 2017-10-25 DVDs & Blu-rays 11
907 2017-10-25 Video Games 11
993 2017-10-26 DVDs & Blu-rays 11
938 2017-10-26 DVDs & Blu-rays 11
997 2017-10-26 Music 11
1115 2017-10-29 DVDs & Blu-rays 11