用Pyspark加载腌制模型

时间:2018-04-04 01:48:37

标签: scikit-learn pyspark pickle

我正在尝试使用PySpark从S3加载一个pickle模型,然后使用该模型进行预测。我可以加载模型,但是当我尝试将模型提供给进行预测的方法时,我遇到了public abstract class SuperClass extends SomethingElse implements SomeOtherThing { protected final PlmlForm plForm = null; String getData() { } // How can I do something like s.getReport()? // This is the way you call getReport String getReport() { return new SubClass().s.getReport(); //If s is static... return SubClass.s.getReport(); } //Illustrative example Subclass sub = new SubClass(); sub.s.getReport(); // Here the magic happens } public class SubClass extends SuperClass { private Report report = null; /* * You create the object s that refers to Subclass */ public SuperClass/SubClass s = this; //Or public static SuperClass/SubClass s = this; /********************************************/ String getReport() { } } main(String[] args) { SuperClass s = this; // As SubClass extends from Superclass, for Java, it is the same as Superclass SubClass s = this; // And here it is the same, since SubClass and SuperClass in this case are the same } 我已经阅读了有关能够和不能被腌制的文档,但我可以似乎找不到这个错误。

加载模型的代码:

PicklingError: Cannot pickle files that are not opened for reading

模型进行预测的方法:

    rdd_pickle = spark.sparkContext.binaryFiles(model_path_in_s3)
    l = rdd_pickle.collect()
    pickle_text = l[0][1]
    self.model = pickle.loads(pickle_text)

进行所有计算的方法:

def turn_labeller(convo):
    """Annotate a conversation with turn labels.

    :param convo: Conversation whose turns haven't been labelled.
    :type convo: list of dict
    :param model: CRF model used to predict turn labels
    :type model: sklearn-crfsuite.CRF 
    :return: The convo, now with labelled turns
    :rtype: list of dict
    """
    turn_features = [extract_turn_features(i, convo) for i in range(len(convo))]

    predicted_labels = model.predict_single(turn_features)
    for i,turn in enumerate(convo):
        if i == 0:
          turn["previous_turn_label"] = "__ROOT__"
        else:
          turn["previous_turn_label"] = predicted_labels[i-1]
        turn["turn_label"] = predicted_labels[i]
    return convo

一直运行到最后一个flatMap并调用turn_labeller。导致错误的那个电话有什么用?

1 个答案:

答案 0 :(得分:0)

为了记录,我的问题只是我没有在turn_labeller方法的定义之上添加@staticmethod装饰器。一个简单的解决方案。