python luigi localTarget pickle

时间:2017-06-07 15:46:33

标签: python-2.7 pickle luigi

我在Windows 7上运行,Python 2.7通过Anaconda 4.3.17,Luigi 2.4.0,Pandas 0.18,sklearn版本0.18运行。在下面,我试图让luigi.LocalTarget输出成为存储一些不同对象的pickle(使用firstJob),然后从依赖作业(secondJob)中的pickle读取。如果我从命令行运行以下命令,firstJob将成功完成:

" python -m luigi --module luigiPickle firstJob - 日期2017-06-07 --local-scheduler"

但是,如果我尝试运行secondJob,即

" python -m luigi --module luigiPickle secondJob - 日期2017-06-07 --local-scheduler"

我得到了

Traceback (most recent call last):
  File "C:\Anaconda2\lib\site-packages\luigi-2.4.0-py2.7.egg\luigi\worker.py", l
ine 191, in run
    new_deps = self._run_get_new_deps()
  File "C:\Anaconda2\lib\site-packages\luigi-2.4.0-py2.7.egg\luigi\worker.py", l
ine 129, in _run_get_new_deps
    task_gen = self.task.run()
  File "luigiPickle.py", line 41, in run
    ret2 = pickle.load(inFile)
  File "C:\Anaconda2\lib\pickle.py", line 1384, in load
    return Unpickler(file).load()
  File "C:\Anaconda2\lib\pickle.py", line 864, in load
    dispatch[key](self)
  File "C:\Anaconda2\lib\pickle.py", line 1096, in load_global
    klass = self.find_class(module, name)
  File "C:\Anaconda2\lib\pickle.py", line 1130, in find_class
    __import__(module)
ImportError: No module named frame

由于没有识别pandas.DataFrame()对象(可能是范围问题?),luigi似乎无法阅读pickle。

import luigi
import pandas as pd
import pickle
from sklearn.linear_model import LinearRegression

class firstJob(luigi.Task):
    date = luigi.DateParameter()

    def requires(self):
        return None

    def output(self):
        return luigi.LocalTarget('%s_first.pickle' % self.date)

    def run(self):
        ret = {}
        ret['a'] = pd.DataFrame({'a': [1, 2], 'b': [3, 4]})
        ret['b'] = pd.DataFrame({'a': [3, 4], 'd': [0, 0]})
        ret['c'] = LinearRegression()
        outFile = self.output().open('wb')
        pickle.dump(ret, outFile, protocol=pickle.HIGHEST_PROTOCOL)
        outFile.close()

class secondJob(luigi.Task):
    date = luigi.DateParameter()

    def requires(self):
        return firstJob(self.date)

    def output(self):
        return luigi.LocalTarget('%s_second.pickle' % self.date)

    def run(self):
        inFile = self.input().open('rb')
        ret2 = pickle.load(inFile)
        inFile.close()

if __name__ == '__main__':
    luigi.run()

2 个答案:

答案 0 :(得分:5)

luigi open命令不能与二进制的b标志一起使用 - 它将它从选项字符串中删除。 (不知道为什么)。最好只使用带有path属性的标准open:

open(self.input().path, 'rb')open(self.output().path, 'wb')

答案 1 :(得分:0)

d6tflow解决了这个问题,请参阅example for sklearn model pickle来回答此问题。另外,您无需编写所有样板代码。

Too Many Redirects