我想在这里使用huber parallel scales和mean estimator:http://www.statsmodels.org/dev/generated/statsmodels.robust.scale.Huber.html但这里是错误:
In [1]: from statsmodels.robust.scale import huber
In [2]: huber([1,2,1000,3265,454])
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-2-80c7d73a4467> in <module>()
----> 1 huber([1,2,1000,3265,454])
/usr/local/lib/python3.5/dist-packages/statsmodels/robust/scale.py in __call__(self, a, mu, initscale, axis)
132 scale = tools.unsqueeze(scale, axis, a.shape)
133 mu = tools.unsqueeze(mu, axis, a.shape)
--> 134 return self._estimate_both(a, scale, mu, axis, est_mu, n)
135
136 def _estimate_both(self, a, scale, mu, axis, est_mu, n):
/usr/local/lib/python3.5/dist-packages/statsmodels/robust/scale.py in _estimate_both(self, a, scale, mu, axis, est_mu, n)
176 else:
177 return nmu.squeeze(), nscale.squeeze()
--> 178 raise ValueError('joint estimation of location and scale failed to converge in %d iterations' % self.maxiter)
179
180 huber = Huber()
ValueError: joint estimation of location and scale failed to converge in 30 iterations
奇怪的是它取决于输入:
In [3]: huber([1,2,1000,3265])
Out[3]: (array(1067.0), array(1744.3785635989168))
这是一个错误还是我在这里做错了什么?
由于
编辑:我知道tol和maxiter参数,你在这种情况下说的是什么,但这里有一个例子,它没有:In [1]: a=[4.3498776644415429, 16.549773154535362, 4.6335866963356445, 8.2581784707468771, 1.3508951981036594, 1.2918098244960199, 5.734
...: 9939516388453, 0.41663442483143953, 4.5632532990486077, 8.1020487048604473, 1.3823829480004797, 1.7848176927929804, 4.3058348043
...: 423473, 0.9427710734983884, 0.95646846668018171, 0.75309469901235238, 8.4689505489677011, 0.77420558084543778, 0.765060223824508
...: 45, 1.5673666392992407, 1.4109878442590897, 0.45592078018861532, 4.71748181503082, 0.65942167325205436, 0.19099796838644958, 1.0
...: 979997466466069, 4.8145761128848106, 0.75417363824157768, 5.0723603274833362, 0.30627007428414721, 4.8178689054947981, 1.5383475
...: 959362511, 0.7971041296695851, 4.689826268915076, 8.6704498595703274, 0.56825576954483947, 8.0383098149129708, 0.394000842811084
...: 22, 0.89827542590321019, 8.5160701523615785, 9.0413284666560934, 1.3590549231652516, 8.355489609767794, 4.2413169378427682, 4.84
...: 97143419119348, 4.8566372637376292, 0.80979444214378904, 0.26613505510736446, 1.1525345100417608, 4.9784132426823824, 1.07663603
...: 91211101, 1.9604545887151259, 0.77151237419054963, 1.2302626325699455, 0.846912462599126, 0.85852710339862037, 0.380355420248302
...: 99, 4.7586522644359093, 0.46796412732813891, 0.52933680009769146, 5.2521765047159708, 0.71915381047435945, 1.3502865819436387, 0
...: .76647272458736559, 1.1206637428992841, 0.72560665950851866, 4.4248008256265781, 4.7984989298357457, 1.0696617588880453, 0.71104
...: 701759920497, 0.46986438176394463, 0.71008686283792688, 0.40698839770374351, 1.0015132141773508, 1.3825224746094535, 0.932562703
...: 04709066, 8.8896053101317687, 0.64148877800521564, 0.69250319745644506, 4.7187793763802919, 5.0620089438920939, 5.17105647739872
...: 33, 9.5341720525579809, 0.43052713463119635, 0.79288845392647533, 0.51059695992994469, 0.48295891743804287, 0.93370512281086504,
...: 1.7493284310512855, 0.62744557356984221, 5.0965146009791704, 0.12615625248684664, 1.1064189602023351, 0.33183381198282491, 4.90
...: 32450273833179, 0.90296573725985785, 1.2885647882049298, 0.84669066664867576, 1.1481783837280477, 0.94784483590946278, 9.8019240
...: 792478755, 0.91501030105202807, 0.57121190468293803, 5.5511993201050887, 0.66054793663263078, 9.6626055869916065, 5.262806161853
...: 6908, 9.5905100705465696, 0.70369230764306401, 8.9747551552440186, 1.572014845182425, 1.9571634928868149, 0.62030418652298325, 0
...: .3395356767840213, 0.48287760518144929, 4.7937042347984198, 0.74251393675618682, 0.87369567300592954, 4.5381205696031586, 5.2673
...: 192797619084]
In [2]: from statsmodels.robust.scale import huber, Huber
In [3]: Huber(maxiter=10000,tol=1e-1)(a)
/usr/lib/python3.6/site-packages/statsmodels/robust/scale.py:168: RuntimeWarning: invalid value encountered in sqrt
/ (n * self.gamma - (a.shape[axis] - card) * self.c**2))
/usr/lib/python3.6/site-packages/statsmodels/robust/scale.py:164: RuntimeWarning: invalid value encountered in less_equal
subset = np.less_equal(np.fabs((a - mu)/scale), self.c)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-3-4b9929ff84bb> in <module>()
----> 1 Huber(maxiter=10000,tol=1e-1)(a)
/usr/lib/python3.6/site-packages/statsmodels/robust/scale.py in __call__(self, a, mu, initscale, axis)
132 scale = tools.unsqueeze(scale, axis, a.shape)
133 mu = tools.unsqueeze(mu, axis, a.shape)
--> 134 return self._estimate_both(a, scale, mu, axis, est_mu, n)
135
136 def _estimate_both(self, a, scale, mu, axis, est_mu, n):
/usr/lib/python3.6/site-packages/statsmodels/robust/scale.py in _estimate_both(self, a, scale, mu, axis, est_mu, n)
176 else:
177 return nmu.squeeze(), nscale.squeeze()
--> 178 raise ValueError('joint estimation of location and scale failed to converge in %d iterations' % self.maxiter)
179
180 huber = Huber()
ValueError: joint estimation of location and scale failed to converge in 10000 iterations
抱歉,这是我原来的错误,但因为&#34; a&#34;很长,我试图用较小的数组重新创建错误。在这种情况下,我不认为maxiter和tol是罪魁祸首。
答案 0 :(得分:0)
使用Huber类时,允许的迭代次数maxiter可以更改。
e.g。这工作
>>> from statsmodels.robust.scale import huber, Huber
>>> Huber(maxiter=200)([1,2,1000,3265,454])
(array(925.6483958529737), array(1497.0624070525248))
使用该类时,也可以更改norm函数的阈值参数。在这样的非常小的样本中,估计可能对阈值参数非常敏感。
作为替代方案,我们可以使用RLM模型并对常数进行回归,两个阈值和算法都不同,但它应该产生类似的稳健结果。在新的例子中,标准偏差和稳健MAD之间的尺度估计,而平均估计大于中位数和平均值。
>>> res = RLM(a, np.ones(len(a)), M=norms.HuberT(t=1.5)).fit(scale_est=scale.HuberScale(d=1.5))
>>> res.params, res.scale
(array([ 2.47711987]), 2.5218278029435406)
>>> np.median(a), scale.mad(a)
(1.1503564468849041, 0.98954533464908301)
>>> np.mean(a), np.std(a)
(2.8650886010542269, 3.0657561979615977)
得到的权重表明某些高值是低权重的
>>> widx = np.argsort(res.weights)
>>> np.asarray(a)[widx[:10]]
array([ 16.54977315, 9.80192408, 9.66260559, 9.59051007,
9.53417205, 9.04132847, 8.97475516, 8.88960531,
8.67044986, 8.51607015])
我不熟悉Huber联合均值尺度估计的实现细节。 收敛失败的一个可能原因是值的分布在3组中聚集,在16处有一个额外的异常值,在绘制直方图时可见。这可能导致迭代求解器的收敛周期,其中包括或排除第三组。但这只是猜测。