我试图通过选择正确的“跌落率”(r)来最小化预测误差。我仍然是Pandas的新手,也是SciPy的新手。请帮忙!
import pandas as pd
from scipy.optimize import fmin
data = pd.DataFrame({'Division': [1,2,3]*3,
'Month': ['May','May','May','June','June','Jun','Jul','Jul','Jul'],
'Definite_Units':[8]*9,
'Maybe_Units':[3,2,1]*3,
'Actually_Shipped_Units':[9]*9})
p = lambda r,x,y: x+y*r
e = lambda r,x,y,z: abs(1-(p(x,y,r)/z))
x = div_data['Definite_Units'].sum
y = div_data['Maybe_Units'].sum
z = div_data['Actually_Shipped_Units'].sum
for d in range(1,4):
r0 = 1
div_data = data['Division']=d
x = div_data['Definite_Units'].sum()
y = div_data['Maybe_Units'].sum()
z = div_data['Actually_Shipped_Units'].sum()
t = fmin(e,r0,args=(x,y,z))
print d, t
我希望每个部门都有一个r,以最小化e。
所以在这种情况下我的输出应该是:
答案 0 :(得分:0)
所以我在这个项目中学到了一些关于fmin的东西:
-Arguments必须是数组格式,所以我做了return_array辅助函数。
- 要优化的变量必须首先列在要最小化的函数中。所以对我来说它必须是e(r,c,u,s),而不是e(c,u,s,r)。
#calculate new fall out rates with fmin
import numpy as np
import pandas as pd
from scipy.optimize import fmin
data = pd.DataFrame({'DIV': [1,2,3]*3,
'MONTH': ['May','May','May','June','June','Jun','Jul','Jul','Jul'],
'C':[8]*9,
'U':[3,2,1]*3,
'S':[9]*9})
data.to_csv(r'C:\Users\mbabski\Documents\Unit Plan Summer 2016\data_test.csv')
def return_array(x):
return x.values
def mape(c,u,s,r): #returns an array of line level Mean Absolute Percentage Errors
p = c + u * r #calculates the forecasted number number
m = abs(1.0-(p/s)) #calculates the MAPE at the line level
return m
def e(r,c,u,s): #calculates average of the MAPEs
return np.mean(mape(c,u,s,r))
for d in range(1,4):
div_data = data[data.DIV==d]
c = return_array(div_data.C)
u = return_array(div_data.U)
s = return_array(div_data.S)
r0 = [[1.0]]
t = fmin(e,r0,args=(c,u,s))
print 'r:',t
优化成功终止。
当前功能值:0.000011
迭代次数:16
功能评估:32
r:[0.33330078]
优化成功终止 当前功能值:0.000000
迭代次数:15
功能评估:30
r:[0.5]
优化成功终止 当前功能值:0.000000
迭代次数:10
功能评估:20
r:[1。]