Pickle的Hyperopt mongotrials问题:AttributeError:'module'对象没有属性

时间:2016-12-29 05:55:01

标签: mongodb python-2.7

我正在尝试使用与MongoDB的Hyperopt并行搜索,并遇到Mongotrials的一些问题,已经讨论过here。我已经尝试了所有方法,但仍然无法找到解决我特定问题的方法。我试图最小化的具体模型是来自sklearn的RadomForestRegressor。

我已经关注了这个tutorial。而且我能够打印出计算出的“fmin”而没有任何问题。

到目前为止,这是我的步骤:

1)激活名为“tensorflow”的虚拟环境(我已在那里安装了所有库)

2)启动MongoDB:

(tensorflow) bash-3.2$ mongod --dbpath . --port 1234 --directoryperdb --journal --nohttpinterface

3)启动工人:

(tensorflow) bash-3.2$ hyperopt-mongo-worker --mongo=localhost:1234/foo_db --poll-interval=0.1

4)运行我的python代码,我的python代码如下:

import numpy as np
import pandas as pd

from sklearn.metrics import mean_absolute_error

from hyperopt import hp, fmin, tpe, STATUS_OK, Trials
from hyperopt.mongoexp import MongoTrials


# Preprocessing data
train_xg = pd.read_csv('train.csv')
n_train = len(train_xg)
print "Whole data set size: ", n_train

# Creating columns for features, and categorical features
features_col = [x for x in train_xg.columns if x not in ['id', 'loss', 'log_loss']]
cat_features_col = [x for x in train_xg.select_dtypes(include=['object']).columns if x not in ['id', 'loss', 'log_loss']]
for c in range(len(cat_features_col)):
    train_xg[cat_features_col[c]] = train_xg[cat_features_col[c]].astype('category').cat.codes

# Use this to train random forest regressor
train_xg_x = np.array(train_xg[features_col])
train_xg_y = np.array(train_xg['loss'])


space_rf = { 'min_samples_leaf': hp.choice('min_samples_leaf', range(1,100)) }

trials = MongoTrials('mongo://localhost:1234/foo_db/jobs', exp_key='exp1')

def minMe(params):
    # Hyperopt tuning for hyperparameters
    from sklearn.model_selection import cross_val_score
    from sklearn.ensemble import RandomForestRegressor
    from hyperopt import STATUS_OK

    try:
        import dill as pickle
        print('Went with dill')
    except ImportError:
        import pickle

    def hyperopt_rf(params):
        rf = RandomForestRegressor(**params)
        return cross_val_score(rf, train_xg_x, train_xg_y).mean()

    acc = hyperopt_rf(params)
    print 'new acc:', acc, 'params: ', params
    return {'loss': -acc, 'status': STATUS_OK}

best = fmin(fn=minMe, space=space_rf, trials=trials, algo=tpe.suggest, max_evals=100)
print "Best: ", best

5)运行上面的Python代码后,我收到以下错误:

INFO:hyperopt.mongoexp:Error while unpickling. Try installing dill via "pip install dill" for enhanced pickling support.
INFO:hyperopt.mongoexp:job exception: 'module' object has no attribute 'minMe'
Traceback (most recent call last):
  File "/Users/WernerChao/tensorflow/bin/hyperopt-mongo-worker", line 6, in <module>
    sys.exit(hyperopt.mongoexp.main_worker())
  File "/Users/WernerChao/tensorflow/lib/python2.7/site-packages/hyperopt/mongoexp.py", line 1302, in main_worker
    return main_worker_helper(options, args)
  File "/Users/WernerChao/tensorflow/lib/python2.7/site-packages/hyperopt/mongoexp.py", line 1249, in main_worker_helper
    mworker.run_one(reserve_timeout=float(options.reserve_timeout))
  File "/Users/WernerChao/tensorflow/lib/python2.7/site-packages/hyperopt/mongoexp.py", line 1064, in run_one
    domain = pickle.loads(blob)
AttributeError: 'module' object has no attribute 'minMe'
INFO:hyperopt.mongoexp:PROTOCOL mongo
INFO:hyperopt.mongoexp:USERNAME None
INFO:hyperopt.mongoexp:HOSTNAME localhost
INFO:hyperopt.mongoexp:PORT 1234
INFO:hyperopt.mongoexp:PATH /foo_db/jobs
INFO:hyperopt.mongoexp:DB foo_db
INFO:hyperopt.mongoexp:COLLECTION jobs
INFO:hyperopt.mongoexp:PASS None
INFO:hyperopt.mongoexp:Error while unpickling. Try installing dill via "pip install dill" for enhanced pickling support.
INFO:hyperopt.mongoexp:job exception: 'module' object has no attribute 'minMe'
Traceback (most recent call last):
  File "/Users/WernerChao/tensorflow/bin/hyperopt-mongo-worker", line 6, in <module>
    sys.exit(hyperopt.mongoexp.main_worker())
  File "/Users/WernerChao/tensorflow/lib/python2.7/site-packages/hyperopt/mongoexp.py", line 1302, in main_worker
    return main_worker_helper(options, args)
  File "/Users/WernerChao/tensorflow/lib/python2.7/site-packages/hyperopt/mongoexp.py", line 1249, in main_worker_helper
    mworker.run_one(reserve_timeout=float(options.reserve_timeout))
  File "/Users/WernerChao/tensorflow/lib/python2.7/site-packages/hyperopt/mongoexp.py", line 1064, in run_one
    domain = pickle.loads(blob)
AttributeError: 'module' object has no attribute 'minMe'
INFO:hyperopt.mongoexp:PROTOCOL mongo
INFO:hyperopt.mongoexp:USERNAME None
INFO:hyperopt.mongoexp:HOSTNAME localhost
INFO:hyperopt.mongoexp:PORT 1234
INFO:hyperopt.mongoexp:PATH /foo_db/jobs
INFO:hyperopt.mongoexp:DB foo_db
INFO:hyperopt.mongoexp:COLLECTION jobs
INFO:hyperopt.mongoexp:PASS None
INFO:hyperopt.mongoexp:Error while unpickling. Try installing dill via "pip install dill" for enhanced pickling support.
INFO:hyperopt.mongoexp:job exception: 'module' object has no attribute 'minMe'
Traceback (most recent call last):
  File "/Users/WernerChao/tensorflow/bin/hyperopt-mongo-worker", line 6, in <module>
    sys.exit(hyperopt.mongoexp.main_worker())
  File "/Users/WernerChao/tensorflow/lib/python2.7/site-packages/hyperopt/mongoexp.py", line 1302, in main_worker
    return main_worker_helper(options, args)
  File "/Users/WernerChao/tensorflow/lib/python2.7/site-packages/hyperopt/mongoexp.py", line 1249, in main_worker_helper
    mworker.run_one(reserve_timeout=float(options.reserve_timeout))
  File "/Users/WernerChao/tensorflow/lib/python2.7/site-packages/hyperopt/mongoexp.py", line 1064, in run_one
    domain = pickle.loads(blob)
AttributeError: 'module' object has no attribute 'minMe'
INFO:hyperopt.mongoexp:PROTOCOL mongo
INFO:hyperopt.mongoexp:USERNAME None
INFO:hyperopt.mongoexp:HOSTNAME localhost
INFO:hyperopt.mongoexp:PORT 1234
INFO:hyperopt.mongoexp:PATH /foo_db/jobs
INFO:hyperopt.mongoexp:DB foo_db
INFO:hyperopt.mongoexp:COLLECTION jobs
INFO:hyperopt.mongoexp:PASS None
INFO:hyperopt.mongoexp:no job found, sleeping for 0.7s
INFO:hyperopt.mongoexp:Error while unpickling. Try installing dill via "pip install dill" for enhanced pickling support.
INFO:hyperopt.mongoexp:job exception: 'module' object has no attribute 'minMe'
Traceback (most recent call last):
  File "/Users/WernerChao/tensorflow/bin/hyperopt-mongo-worker", line 6, in <module>
    sys.exit(hyperopt.mongoexp.main_worker())
  File "/Users/WernerChao/tensorflow/lib/python2.7/site-packages/hyperopt/mongoexp.py", line 1302, in main_worker
    return main_worker_helper(options, args)
  File "/Users/WernerChao/tensorflow/lib/python2.7/site-packages/hyperopt/mongoexp.py", line 1249, in main_worker_helper
    mworker.run_one(reserve_timeout=float(options.reserve_timeout))
  File "/Users/WernerChao/tensorflow/lib/python2.7/site-packages/hyperopt/mongoexp.py", line 1064, in run_one
    domain = pickle.loads(blob)
AttributeError: 'module' object has no attribute 'minMe'
INFO:hyperopt.mongoexp:exiting with N=9223372036854775803 after 4 consecutive exceptions

6)然后Mongo工作人员会关闭。

我尝试过的事情:

  • 安装“dill”作为建议的错误 - &gt;没用?
  • 将全局导入放入目标函数中,以便它可以发泡 - &gt;没用?
  • 将try除了“dill”或“pickle”作为import - &gt;没用?

有没有人有类似的问题?我已经没有想法尝试了,并且已经在这2天没有工作了。我想我错过了一些非常简单的东西,似乎无法找到它。 我错过了什么? 欢迎任何建议!

4 个答案:

答案 0 :(得分:7)

在python 3.5中遇到同样的问题。安装Dill没有帮助,也没有在MongoTrials或hyperopt-mongo-worker cli中设置workdir。 hyperopt-mongo-worker似乎无法访问定义函数的__main__

AttributeError: Can't get attribute 'minMe' on <module '__main__' from ...hyperopt-mongo-worker

正如@jaikumarm建议的那样,我通过编写包含所有必需功能的模块文件来规避问题。但是,我没有将其软链接到bin目录,而是在运行PYTHONPATH之前扩展了hyperopt-mongo-worker

export PYTHONPATH="${PYTHONPATH}:<dir_with_the_module.py>"
hyperopt-mongo-worker ...

这样,hyperopt-monogo-worker就可以导入包含minMe的模块。

答案 1 :(得分:4)

在提出可行的解决方案之前,我与他斗争了好几天。有两个问题: 1. mongo worker生成一个单独的进程来运行优化器,因此原始python文件中的任何上下文都将丢失并且不可用于此新进程。 2.这个新进程的导入发生在hyperopt-mongo-worker scipy的上下文中,在你的情况下将是/ Users / WernerChao / tensorflow / bin /.

所以我的解决方案是让这个新的优化器功能完全自给

optimizer.py

import numpy as np
import pandas as pd

from sklearn.metrics import mean_absolute_error

# Preprocessing data
train_xg = pd.read_csv('train.csv')
n_train = len(train_xg)
print "Whole data set size: ", n_train

# Creating columns for features, and categorical features
features_col = [x for x in train_xg.columns if x not in ['id', 'loss', 'log_loss']]
cat_features_col = [x for x in train_xg.select_dtypes(include=['object']).columns if x not in ['id', 'loss', 'log_loss']]
for c in range(len(cat_features_col)):
    train_xg[cat_features_col[c]] = train_xg[cat_features_col[c]].astype('category').cat.codes

# Use this to train random forest regressor
train_xg_x = np.array(train_xg[features_col])
train_xg_y = np.array(train_xg['loss'])



def minMe(params):
    # Hyperopt tuning for hyperparameters
    from sklearn.model_selection import cross_val_score
    from sklearn.ensemble import RandomForestRegressor
    from hyperopt import STATUS_OK

    try:
        import dill as pickle
        print('Went with dill')
    except ImportError:
        import pickle

    def hyperopt_rf(params):
        rf = RandomForestRegressor(**params)
        return cross_val_score(rf, train_xg_x, train_xg_y).mean()

    acc = hyperopt_rf(params)
    print 'new acc:', acc, 'params: ', params
    return {'loss': -acc, 'status': STATUS_OK}

wrapper.py

from hyperopt import hp, fmin, tpe, STATUS_OK, Trials
from hyperopt.mongoexp import MongoTrials

import optimizer

space_rf = { 'min_samples_leaf': hp.choice('min_samples_leaf', range(1,100)) }
best = fmin(fn=optimizer.minMe, space=space_rf, trials=trials, algo=tpe.suggest, max_evals=100)
print "Best: ", best

trials = MongoTrials('mongo://localhost:1234/foo_db/jobs', exp_key='exp1')

使用此代码后,将optimizer.py链接到bin文件夹

ln -s /Users/WernerChao/Git/test/optimizer.py /Users/WernerChao/tensorflow/bin/

现在运行wrapper.py,然后运行mongo worker,它应该能够从其本地上下文导入优化器并运行minMe函数。

答案 2 :(得分:0)

尝试在tensorflow(或可能是worker)的Python环境中安装Dill

/Users/WernerChao/tensorflow/lib/python2.7/site-packages/hyperopt

您的目标是摆脱hyperopt错误消息:

hyperopt.mongoexp:Error while unpickling. Try installing dill via "pip install dill" for enhanced pickling support.

这是因为默认情况下Python不能编组函数。它需要dill库来扩展Python的pickling模块,用于序列化/反序列化Python对象。在您的情况下,它无法序列化您的函数minMe()

答案 3 :(得分:0)

我制作了一个单独的文件来计算损失并将其复制到/anaconda2/bin//anaconda2/lib/python2.7/site-packages/hyperopt 它工作正常。

这是我的追踪

Traceback (most recent call last):
File "/home/greatskull/anaconda2/bin/hyperopt-mongo-worker", line 6, in <module>
sys.exit(hyperopt.mongoexp.main_worker())
File "/home/greatskull/anaconda2/lib/python2.7/site-packages/hyperopt/mongoexp.py", line 1302, in main_worker
return main_worker_helper(options, args)
File "/home/greatskull/anaconda2/lib/python2.7/site-packages/hyperopt/mongoexp.py", line 1249, in main_worker_helper
mworker.run_one(reserve_timeout=float(options.reserve_timeout))
File "/home/greatskull/anaconda2/lib/python2.7/site-packages/hyperopt/mongoexp.py", line 1073, in run_one
with temp_dir(workdir, erase_created_workdir), working_dir(workdir):
File "/home/greatskull/anaconda2/lib/python2.7/contextlib.py", line 17, in __enter__
return self.gen.next()
File "/home/greatskull/anaconda2/lib/python2.7/site-packages/hyperopt/utils.py", line 229, in temp_dir
os.makedirs(dir)
File "/home/greatskull/anaconda2/lib/python2.7/os.py", line 150, in makedirs
makedirs(head, mode)
File "/home/greatskull/anaconda2/lib/python2.7/os.py", line 157, in makedirs
mkdir(name, mode)