我正在尝试使用自动回归结构为statsmodel中的某些面板数据运行GEE,查看不同时段的销售之间的差异:
ga = sm.families.Gaussian()
ar = sm.cov_struct.Autoregressive()
times = (BakeSale['Hour'].values)
ar.dep_params = 0.06
model2 = sm.GEE.from_formula("CookieSales ~ C(Hour) + Arrivals + TotalSalesPeople", groups=BakeSale["SalesPerson"],
data=BakeSale, family=ga, time=times, cov_struct=ar)
result2 = model2.fit(start_params=result1.params)
print(result2.summary())
这会引发ValueError:不是包围间隔。
我目前将班次的'小时'编码为序数整数(即1-8),但也有时间戳。
有关如何克服这个问题的任何想法?
完整输出:
//anaconda/lib/python3.5/site-packages/statsmodels/genmod/cov_struct.py:724: RuntimeWarning: divide by zero encountered in true_divide
wts = 1. / var
//anaconda/lib/python3.5/site-packages/statsmodels/genmod/cov_struct.py:725: RuntimeWarning: invalid value encountered in true_divide
wts /= wts.sum()
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-81-d81d0b97546e> in <module>()
7 #CookieSales ~ C(Hour) + Arrivals + TotalSalesPeople"
8 # Maybe try without C, or find if any with nan value or such
----> 8 result2 = model2.fit(start_params=result1.params)
9 print(result2.summary())
10 print(ar.summary())
//anaconda/lib/python3.5/site-packages/statsmodels/genmod/generalized_estimating_equations.py in fit(self, maxiter, ctol, start_params, params_niter, first_dep_update, cov_type, ddof_scale, scaling_factor)
1111 if (self.update_dep and (itr % params_niter) == 0
1112 and (itr >= first_dep_update)):
-> 1113 self._update_assoc(mean_params)
1114 num_assoc_updates += 1
1115
//anaconda/lib/python3.5/site-packages/statsmodels/genmod/generalized_estimating_equations.py in _update_assoc(self, params)
1259 """
1260
-> 1261 self.cov_struct.update(params)
1262
1263 def _derivative_exog(self, params, exog=None, transform='dydx',
//anaconda/lib/python3.5/site-packages/statsmodels/genmod/cov_struct.py in update(self, params)
766
767 from scipy.optimize import brent
--> 768 self.dep_params = brent(fitfunc, brack=[b_lft, b_ctr, b_rgt])
769
770
//anaconda/lib/python3.5/site-packages/scipy/optimize/optimize.py in brent(func, args, brack, tol, full_output, maxiter)
2001 options = {'xtol': tol,
2002 'maxiter': maxiter}
-> 2003 res = _minimize_scalar_brent(func, brack, args, **options)
2004 if full_output:
2005 return res['x'], res['fun'], res['nit'], res['nfev']
//anaconda/lib/python3.5/site-packages/scipy/optimize/optimize.py in _minimize_scalar_brent(func, brack, args, xtol, maxiter, **unknown_options)
2033 full_output=True, maxiter=maxiter)
2034 brent.set_bracket(brack)
-> 2035 brent.optimize()
2036 x, fval, nit, nfev = brent.get_result(full_output=True)
2037 return OptimizeResult(fun=fval, x=x, nit=nit, nfev=nfev,
//anaconda/lib/python3.5/site-packages/scipy/optimize/optimize.py in optimize(self)
1839 # set up for optimization
1840 func = self.func
-> 1841 xa, xb, xc, fa, fb, fc, funcalls = self.get_bracket_info()
1842 _mintol = self._mintol
1843 _cg = self._cg
//anaconda/lib/python3.5/site-packages/scipy/optimize/optimize.py in get_bracket_info(self)
1827 fc = func(*((xc,) + args))
1828 if not ((fb < fa) and (fb < fc)):
-> 1829 raise ValueError("Not a bracketing interval.")
1830 funcalls = 3
1831 else:
ValueError: Not a bracketing interval.
答案 0 :(得分:0)
通常在生活中,需要确保一个人从正确的数据开始。例如,检查个别班次而不是销售人员:
model2 = sm.GEE.from_formula("CookieSales ~ C(Hour) + Arrivals + TotalSalesPeople", groups=BakeSale["Shift"],
data=BakeSale, family=ga, time=times, cov_struct=ex)
证明最大群集大小可疑,并且平均群集大小刚好超过8。
对原始数据集的争论进行的回顾显示,有几个班次错误地编码了许多,比转换的适当小时数多得多。一旦纠正,模型就能够正常运行......