我正在尝试使用pyspark从腌制的模型生成预测,我使用以下命令获取模型
model = deserialize_python_object(filename)
其中deserialize_python_object(filename)
定义为:
import pickle
def deserialize_python_object(filename):
try:
with open(filename, ‘rb’) as f:
obj = pickle.load(f)
except:
obj = None
return obj
错误日志如下:
File “/Users/gmg/anaconda3/envs/env/lib**strong text**/python3.7/site-packages/pyspark/sql/udf.py”, line 189, in wrapper
return self(*args)
File “/Users/gmg/anaconda3/envs/env/lib/python3.7/site-packages/pyspark/sql/udf.py”, line 167, in __call__
judf = self._judf
File “/Users/gmg/anaconda3/envs/env/lib/python3.7/site-packages/pyspark/sql/udf.py”, line 151, in _judf
self._judf_placeholder = self._create_judf()
File “/Users/gmg/anaconda3/envs/env/lib/python3.7/site-packages/pyspark/sql/udf.py”, line 160, in _create_judf
wrapped_func = _wrap_function(sc, self.func, self.returnType)
File “/Users/gmg/anaconda3/envs/env/lib/python3.7/site-packages/pyspark/sql/udf.py”, line 35, in _wrap_function
pickled_command, broadcast_vars, env, includes = _prepare_for_python_RDD(sc, command)
File “/Users/gmg/anaconda3/envs/env/lib/python3.7/site-packages/pyspark/rdd.py”, line 2420, in _prepare_for_python_RDD
pickled_command = ser.dumps(command)
File “/Users/gmg/anaconda3/envs/env/lib/python3.7/site-packages/pyspark/serializers.py”, line 600, in dumps
raise pickle.PicklingError(msg)
_pickle.PicklingError: Could not serialize object: TypeError: can’t pickle _abc_data objects
答案 0 :(得分:2)
似乎您遇到了与本期相同的问题: https://github.com/cloudpipe/cloudpickle/issues/180
正在发生的事情是pyspark的cloudpickle库在python 3.7中已经过时,您现在应该until pyspark gets that module updated修复此精心制作的补丁程序的问题。
尝试使用此替代方法:
安装cloudpickle pip install cloudpickle
将此添加到您的代码中:
import cloudpickle
import pyspark.serializers
pyspark.serializers.cloudpickle = cloudpickle
monkeypatch积分https://github.com/cloudpipe/cloudpickle/issues/305