scikit - 随机森林回归 - 属性错误:'线程'对象没有属性' _children'

时间:2015-09-30 23:50:38

标签: python flask scikit-learn

设置我的n_jobs参数时,我收到以下错误> 1为随机森林回归量。如果我设置n_jobs = 1,一切正常。

AttributeError:'线程'对象没有属性' _children'

我在烧瓶服务中运行此代码。有趣的是,在烧瓶服务之外运行时不会发生这种情况。我只是在新安装的Ubuntu盒子上复制了这个。在我的Mac上,它运行得很好。

这是一个讨论这个问题的主题,但似乎没有超越解决方法 'Thread' object has no attribute '_children' - django + scikit-learn

对此有何想法?

谢谢大家!

这是我的测试代码:

@test.route('/testfun')

    def testfun():
        from sklearn.ensemble import RandomForestRegressor
        import numpy as np

        train_data = np.array([[1,2,3], [2,1,3]])
        target_data = np.array([1,1])

        model = RandomForestRegressor(n_jobs=2)
        model.fit(train_data, target_data)
        return "yey"

堆栈跟踪:


    Traceback (most recent call last):
      File "/usr/local/lib/python2.7/dist-packages/flask/app.py", line 1836, in __call__
        return self.wsgi_app(environ, start_response)
      File "/usr/local/lib/python2.7/dist-packages/flask/app.py", line 1820, in wsgi_app
        response = self.make_response(self.handle_exception(e))
      File "/usr/local/lib/python2.7/dist-packages/flask/app.py", line 1403, in handle_exception
        reraise(exc_type, exc_value, tb)
      File "/usr/local/lib/python2.7/dist-packages/flask/app.py", line 1817, in wsgi_app
        response = self.full_dispatch_request()
      File "/usr/local/lib/python2.7/dist-packages/flask/app.py", line 1477, in full_dispatch_request
        rv = self.handle_user_exception(e)
      File "/usr/local/lib/python2.7/dist-packages/flask/app.py", line 1381, in handle_user_exception
        reraise(exc_type, exc_value, tb)
      File "/usr/local/lib/python2.7/dist-packages/flask/app.py", line 1475, in full_dispatch_request
        rv = self.dispatch_request()
      File "/usr/local/lib/python2.7/dist-packages/flask/app.py", line 1461, in dispatch_request
        return self.view_functions[rule.endpoint](**req.view_args)
      File "/home/vagrant/flask.global-relevance-engine/global_relevance_engine/routes/test.py", line 47, in testfun
        model.fit(train_data, target_data)
      File "/usr/local/lib/python2.7/dist-packages/sklearn/ensemble/forest.py", line 273, in fit
        for i, t in enumerate(trees))
      File "/usr/local/lib/python2.7/dist-packages/sklearn/externals/joblib/parallel.py", line 574, in __call__
        self._pool = ThreadPool(n_jobs)
      File "/usr/lib/python2.7/multiprocessing/pool.py", line 685, in __init__
        Pool.__init__(self, processes, initializer, initargs)
      File "/usr/lib/python2.7/multiprocessing/pool.py", line 136, in __init__
        self._repopulate_pool()
      File "/usr/lib/python2.7/multiprocessing/pool.py", line 199, in _repopulate_pool
        w.start()
      File "/usr/lib/python2.7/multiprocessing/dummy/__init__.py", line 73, in start
        self._parent._children[self] = None

1 个答案:

答案 0 :(得分:6)

问题

这可能是由于multiprocessing.dummy中的错误(请参阅herehere)在python 2.7.5和3.3.2之前存在。

解决方案A - 升级Python

请参阅评论以确认新版本适用于OP。

解决方案B - 修改dummy

如果您无法升级但有权访问.../py/Lib/multiprocessing/dummy/__init__.py,请按以下方式编辑start类中的DummyProcess方法(应该是〜第73行):

if hasattr(self._parent, '_children'):  # add this line
    self._parent._children[self] = None  # indent this existing line

解决方案C - Monkey Patch

DummyProcess是此错误存在的地方。让我们看一下导入代码中的位置,以确保我们在正确的位置进行修补。

  • RandomForestRegressor
  • 继承:ForestRegressor
  • 继承:BaseForest
  • 创建于:sklearn.ensemble.forest
  • 导入:从sklearn.externals.joblib
  • 并行
  • 从multiprocessing.pool
  • 导入ThreadPool
  • 从multiprocessing.dummy
  • 导入和存储Process
  • 已分配给:DummyProcess也在multiprocessing.dummy

该链中存在DummyProcess可确保在导入RandomForestRegressor后导入该DummyProcess。 此外,我认为我们可以在任何实例之前访问# Let's make it available in our namespace: from sklearn.ensemble import RandomForestRegressor from multiprocessing import dummy as __mp_dummy # Now we can define a replacement and patch DummyProcess: def __DummyProcess_start_patch(self): # pulled from an updated version of Python assert self._parent is __mp_dummy.current_process() # modified to avoid further imports self._start_called = True if hasattr(self._parent, '_children'): self._parent._children[self] = None __mp_dummy.threading.Thread.start(self) # modified to avoid further imports __mp_dummy.DummyProcess.start = __DummyProcess_start_patch 类。 因此,我们可以修改一次类,而不是需要搜索实例来修补。

DummyProcess

除非我遗漏了某些内容,否则从现在起,所有DummyProcess实例都会被修补,因此不会发生错误。

对于任何更广泛使用sklearn的人,我认为你可以反过来做到这一点,并使其适用于所有sklearn而不是专注于一个模块。 在进行任何sklearn导入之前,您需要导入if hasattr(self._parent, '_children'): self._parent._children[self] = None 并对其进行修补。 然后sklearn将从一开始就使用补丁类。

原始答案:

当我写评论时,我意识到我可能已经找到了你的问题 - 我认为你的烧瓶环境正在使用旧版本的python。

原因是在最新版本的python多处理中,您收到该错误的行受条件保护:

RewriteCond %{REQUEST_URI} !^/node/
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(.*)$ / [L,QSA]


RewriteCond  %{REQUEST_URI} !^/node/
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule   ^(.+)         mydomain.com [R=301,L]

看起来this bug在python 2.7中被修复了(我认为从2.7.5修复)。也许你的烧瓶是2.7或2.6?

你能检查一下你的环境吗?如果你无法更新解释器,也许我们可以找到一种方法来修补多处理,以防止它崩溃。