Python Statsmodels中GEE的自回归参数

时间:2017-03-24 21:46:13

标签: python statsmodels panel-data

我正在尝试使用自动回归结构为statsmodel中的某些面板数据运行GEE,查看不同时段的销售之间的差异:

ga = sm.families.Gaussian()
ar = sm.cov_struct.Autoregressive()
times = (BakeSale['Hour'].values)
ar.dep_params = 0.06
model2 = sm.GEE.from_formula("CookieSales ~ C(Hour) + Arrivals + TotalSalesPeople", groups=BakeSale["SalesPerson"],
                  data=BakeSale, family=ga, time=times, cov_struct=ar)
result2 = model2.fit(start_params=result1.params)
print(result2.summary())

这会引发ValueError:不是包围间隔。

我目前将班次的'小时'编码为序数整数(即1-8),但也有时间戳。

有关如何克服这个问题的任何想法?

完整输出:

//anaconda/lib/python3.5/site-packages/statsmodels/genmod/cov_struct.py:724: RuntimeWarning: divide by zero encountered in true_divide
  wts = 1. / var
//anaconda/lib/python3.5/site-packages/statsmodels/genmod/cov_struct.py:725: RuntimeWarning: invalid value encountered in true_divide
  wts /= wts.sum()
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-81-d81d0b97546e> in <module>()
      7 #CookieSales ~ C(Hour) + Arrivals + TotalSalesPeople"
      8 # Maybe try without C, or find if any with nan value or such
----> 8 result2 = model2.fit(start_params=result1.params)
      9 print(result2.summary())
     10 print(ar.summary())

//anaconda/lib/python3.5/site-packages/statsmodels/genmod/generalized_estimating_equations.py in fit(self, maxiter, ctol, start_params, params_niter, first_dep_update, cov_type, ddof_scale, scaling_factor)
   1111             if (self.update_dep and (itr % params_niter) == 0
   1112                 and (itr >= first_dep_update)):
-> 1113                 self._update_assoc(mean_params)
   1114                 num_assoc_updates += 1
   1115 

//anaconda/lib/python3.5/site-packages/statsmodels/genmod/generalized_estimating_equations.py in _update_assoc(self, params)
   1259         """
   1260 
-> 1261         self.cov_struct.update(params)
   1262 
   1263     def _derivative_exog(self, params, exog=None, transform='dydx',

//anaconda/lib/python3.5/site-packages/statsmodels/genmod/cov_struct.py in update(self, params)
    766 
    767         from scipy.optimize import brent
--> 768         self.dep_params = brent(fitfunc, brack=[b_lft, b_ctr, b_rgt])
    769 
    770 

//anaconda/lib/python3.5/site-packages/scipy/optimize/optimize.py in brent(func, args, brack, tol, full_output, maxiter)
   2001     options = {'xtol': tol,
   2002                'maxiter': maxiter}
-> 2003     res = _minimize_scalar_brent(func, brack, args, **options)
   2004     if full_output:
   2005         return res['x'], res['fun'], res['nit'], res['nfev']

//anaconda/lib/python3.5/site-packages/scipy/optimize/optimize.py in _minimize_scalar_brent(func, brack, args, xtol, maxiter, **unknown_options)
   2033                   full_output=True, maxiter=maxiter)
   2034     brent.set_bracket(brack)
-> 2035     brent.optimize()
   2036     x, fval, nit, nfev = brent.get_result(full_output=True)
   2037     return OptimizeResult(fun=fval, x=x, nit=nit, nfev=nfev,

//anaconda/lib/python3.5/site-packages/scipy/optimize/optimize.py in optimize(self)
   1839         # set up for optimization
   1840         func = self.func
-> 1841         xa, xb, xc, fa, fb, fc, funcalls = self.get_bracket_info()
   1842         _mintol = self._mintol
   1843         _cg = self._cg

//anaconda/lib/python3.5/site-packages/scipy/optimize/optimize.py in get_bracket_info(self)
   1827             fc = func(*((xc,) + args))
   1828             if not ((fb < fa) and (fb < fc)):
-> 1829                 raise ValueError("Not a bracketing interval.")
   1830             funcalls = 3
   1831         else:

ValueError: Not a bracketing interval.

1 个答案:

答案 0 :(得分:0)

通常在生活中,需要确保一个人从正确的数据开始。例如,检查个别班次而不是销售人员:

model2 = sm.GEE.from_formula("CookieSales ~ C(Hour) + Arrivals + TotalSalesPeople", groups=BakeSale["Shift"],
              data=BakeSale, family=ga, time=times, cov_struct=ex)

证明最大群集大小可疑,并且平均群集大小刚好超过8。

对原始数据集的争论进行的回顾显示,有几个班次错误地编码了许多,比转换的适当小时数多得多。一旦纠正,模型就能够正常运行......