在statsmodel中使用huber scale和location estimator

时间:2017-10-06 12:45:34

标签: python-3.x statsmodels

我想在这里使用huber parallel scales和mean estimator:http://www.statsmodels.org/dev/generated/statsmodels.robust.scale.Huber.html但这里是错误:

In [1]: from statsmodels.robust.scale import huber

In [2]: huber([1,2,1000,3265,454])
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-2-80c7d73a4467> in <module>()
----> 1 huber([1,2,1000,3265,454])

/usr/local/lib/python3.5/dist-packages/statsmodels/robust/scale.py in __call__(self, a, mu, initscale, axis)
    132         scale = tools.unsqueeze(scale, axis, a.shape)
    133         mu = tools.unsqueeze(mu, axis, a.shape)
--> 134         return self._estimate_both(a, scale, mu, axis, est_mu, n)
    135 
    136     def _estimate_both(self, a, scale, mu, axis, est_mu, n):

/usr/local/lib/python3.5/dist-packages/statsmodels/robust/scale.py in _estimate_both(self, a, scale, mu, axis, est_mu, n)
    176             else:
    177                 return nmu.squeeze(), nscale.squeeze()
--> 178         raise ValueError('joint estimation of location and scale failed to converge in %d iterations' % self.maxiter)
    179 
    180 huber = Huber()

ValueError: joint estimation of location and scale failed to converge in 30 iterations

奇怪的是它取决于输入:

In [3]: huber([1,2,1000,3265])
Out[3]: (array(1067.0), array(1744.3785635989168))

这是一个错误还是我在这里做错了什么?

由于

编辑:我知道tol和maxiter参数,你在这种情况下说的是什么,但这里有一个例子,它没有:

In [1]: a=[4.3498776644415429, 16.549773154535362, 4.6335866963356445, 8.2581784707468771, 1.3508951981036594, 1.2918098244960199, 5.734
   ...: 9939516388453, 0.41663442483143953, 4.5632532990486077, 8.1020487048604473, 1.3823829480004797, 1.7848176927929804, 4.3058348043
   ...: 423473, 0.9427710734983884, 0.95646846668018171, 0.75309469901235238, 8.4689505489677011, 0.77420558084543778, 0.765060223824508
   ...: 45, 1.5673666392992407, 1.4109878442590897, 0.45592078018861532, 4.71748181503082, 0.65942167325205436, 0.19099796838644958, 1.0
   ...: 979997466466069, 4.8145761128848106, 0.75417363824157768, 5.0723603274833362, 0.30627007428414721, 4.8178689054947981, 1.5383475
   ...: 959362511, 0.7971041296695851, 4.689826268915076, 8.6704498595703274, 0.56825576954483947, 8.0383098149129708, 0.394000842811084
   ...: 22, 0.89827542590321019, 8.5160701523615785, 9.0413284666560934, 1.3590549231652516, 8.355489609767794, 4.2413169378427682, 4.84
   ...: 97143419119348, 4.8566372637376292, 0.80979444214378904, 0.26613505510736446, 1.1525345100417608, 4.9784132426823824, 1.07663603
   ...: 91211101, 1.9604545887151259, 0.77151237419054963, 1.2302626325699455, 0.846912462599126, 0.85852710339862037, 0.380355420248302
   ...: 99, 4.7586522644359093, 0.46796412732813891, 0.52933680009769146, 5.2521765047159708, 0.71915381047435945, 1.3502865819436387, 0
   ...: .76647272458736559, 1.1206637428992841, 0.72560665950851866, 4.4248008256265781, 4.7984989298357457, 1.0696617588880453, 0.71104
   ...: 701759920497, 0.46986438176394463, 0.71008686283792688, 0.40698839770374351, 1.0015132141773508, 1.3825224746094535, 0.932562703
   ...: 04709066, 8.8896053101317687, 0.64148877800521564, 0.69250319745644506, 4.7187793763802919, 5.0620089438920939, 5.17105647739872
   ...: 33, 9.5341720525579809, 0.43052713463119635, 0.79288845392647533, 0.51059695992994469, 0.48295891743804287, 0.93370512281086504,
   ...:  1.7493284310512855, 0.62744557356984221, 5.0965146009791704, 0.12615625248684664, 1.1064189602023351, 0.33183381198282491, 4.90
   ...: 32450273833179, 0.90296573725985785, 1.2885647882049298, 0.84669066664867576, 1.1481783837280477, 0.94784483590946278, 9.8019240
   ...: 792478755, 0.91501030105202807, 0.57121190468293803, 5.5511993201050887, 0.66054793663263078, 9.6626055869916065, 5.262806161853
   ...: 6908, 9.5905100705465696, 0.70369230764306401, 8.9747551552440186, 1.572014845182425, 1.9571634928868149, 0.62030418652298325, 0
   ...: .3395356767840213, 0.48287760518144929, 4.7937042347984198, 0.74251393675618682, 0.87369567300592954, 4.5381205696031586, 5.2673
   ...: 192797619084]

In [2]: from statsmodels.robust.scale import huber, Huber

In [3]: Huber(maxiter=10000,tol=1e-1)(a)
/usr/lib/python3.6/site-packages/statsmodels/robust/scale.py:168: RuntimeWarning: invalid value encountered in sqrt
  / (n * self.gamma - (a.shape[axis] - card) * self.c**2))
/usr/lib/python3.6/site-packages/statsmodels/robust/scale.py:164: RuntimeWarning: invalid value encountered in less_equal
  subset = np.less_equal(np.fabs((a - mu)/scale), self.c)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-3-4b9929ff84bb> in <module>()
----> 1 Huber(maxiter=10000,tol=1e-1)(a)

/usr/lib/python3.6/site-packages/statsmodels/robust/scale.py in __call__(self, a, mu, initscale, axis)
    132         scale = tools.unsqueeze(scale, axis, a.shape)
    133         mu = tools.unsqueeze(mu, axis, a.shape)
--> 134         return self._estimate_both(a, scale, mu, axis, est_mu, n)
    135 
    136     def _estimate_both(self, a, scale, mu, axis, est_mu, n):

/usr/lib/python3.6/site-packages/statsmodels/robust/scale.py in _estimate_both(self, a, scale, mu, axis, est_mu, n)
    176             else:
    177                 return nmu.squeeze(), nscale.squeeze()
--> 178         raise ValueError('joint estimation of location and scale failed to converge in %d iterations' % self.maxiter)
    179 
    180 huber = Huber()

ValueError: joint estimation of location and scale failed to converge in 10000 iterations

抱歉,这是我原来的错误,但因为&#34; a&#34;很长,我试图用较小的数组重新创建错误。在这种情况下,我不认为maxiter和tol是罪魁祸首。

1 个答案:

答案 0 :(得分:0)

使用Huber类时,允许的迭代次数maxiter可以更改。

e.g。这工作

>>> from statsmodels.robust.scale import huber, Huber
>>> Huber(maxiter=200)([1,2,1000,3265,454])
(array(925.6483958529737), array(1497.0624070525248))

使用该类时,也可以更改norm函数的阈值参数。在这样的非常小的样本中,估计可能对阈值参数非常敏感。

作为替代方案,我们可以使用RLM模型并对常数进行回归,两个阈值和算法都不同,但它应该产生类似的稳健结果。在新的例子中,标准偏差和稳健MAD之间的尺度估计,而平均估计大于中位数和平均值。

>>> res = RLM(a, np.ones(len(a)), M=norms.HuberT(t=1.5)).fit(scale_est=scale.HuberScale(d=1.5))
>>> res.params, res.scale
(array([ 2.47711987]), 2.5218278029435406)

>>> np.median(a), scale.mad(a)
(1.1503564468849041, 0.98954533464908301)

>>> np.mean(a), np.std(a)
(2.8650886010542269, 3.0657561979615977)

得到的权重表明某些高值是低权重的

>>> widx = np.argsort(res.weights)
>>> np.asarray(a)[widx[:10]]
array([ 16.54977315,   9.80192408,   9.66260559,   9.59051007,
         9.53417205,   9.04132847,   8.97475516,   8.88960531,
         8.67044986,   8.51607015])

我不熟悉Huber联合均值尺度估计的实现细节。 收敛失败的一个可能原因是值的分布在3组中聚集,在16处有一个额外的异常值,在绘制直方图时可见。这可能导致迭代求解器的收敛周期,其中包括或排除第三组。但这只是猜测。