我正在尝试使用具有以下功能的livy的python-api客户端(https://github.com/cloudera/livy/tree/master/python-api)加载pyspark.ml模型:
def load_model(context):
from pyspark.ml import PipelineModel
path = "hdfs://master:54310/xxx/model"
model = PipelineModel.load(path)
"""
Use the model for predictions after loading
"""
return
我创建一个HttpClient,然后将上述代码提交给客户端,如下所示:
client = HttpClient("http://localhost:8998")
a = client.submit(load_model).result()
client.stop(True)
运行上面的代码时,出现以下错误:
Traceback (most recent call last):
File "test.py", line 20, in <module>
a = client.submit(load_model).result()
File "/usr/local/lib/python2.7/dist-packages/concurrent/futures/_base.py", line 429, in result
return self.__get_result()
File "/usr/local/lib/python2.7/dist-packages/concurrent/futures/_base.py", line 381, in __get_result
raise exception_type, self._exception, self._traceback
Exception: org.apache.livy.repl.PythonJobException: Client job error:Traceback (most recent call last):
File "/tmp/5744958999022937777", line 160, in processBypassJob
result = deserialized_job(job_context)
File "test.py", line 12, in load_model
File "/home/hduser/spark/python/pyspark/ml/util.py", line 252, in load
return cls.read().load(path)
File "/home/hduser/spark/python/pyspark/ml/util.py", line 193, in load
java_obj = self._jread.load(path)
File "/home/hduser/spark/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 1133, in __call__
answer, self.gateway_client, self.target_id, self.name)
File "/home/hduser/spark/python/pyspark/sql/utils.py", line 79, in deco
raise IllegalArgumentException(s.split(': ', 1)[1], stackTrace)
IllegalArgumentException: u"Error while instantiating 'org.apache.spark.sql.hive.HiveSessionState':"
当我检查livy日志时,发现以下错误:
Caused by: ERROR XSDB6: Another instance of Derby may have already booted the database /home/hduser/livy-0.4.0-incubating-bin/bin/metastore_db.
at org.apache.derby.iapi.error.StandardException.newException(Unknown Source)
at org.apache.derby.iapi.error.StandardException.newException(Unknown Source)
at org.apache.derby.impl.store.raw.data.BaseDataFileFactory.privGetJBMSLockOnDB(Unknown Source)
at org.apache.derby.impl.store.raw.data.BaseDataFileFactory.run(Unknown Source)
at java.security.AccessController.doPrivileged(Native Method)
at org.apache.derby.impl.store.raw.data.BaseDataFileFactory.getJBMSLockOnDB(Unknown Source)
at org.apache.derby.impl.store.raw.data.BaseDataFileFactory.boot(Unknown Source)
at org.apache.derby.impl.services.monitor.BaseMonitor.boot(Unknown Source)
at org.apache.derby.impl.services.monitor.TopService.bootModule(Unknown Source)
at org.apache.derby.impl.services.monitor.BaseMonitor.startModule(Unknown Source)
at org.apache.derby.impl.services.monitor.FileMonitor.startModule(Unknown Source)
at org.apache.derby.iapi.services.monitor.Monitor.bootServiceModule(Unknown Source)
at org.apache.derby.impl.store.raw.RawStore$6.run(Unknown Source)
at java.security.AccessController.doPrivileged(Native Method)
at org.apache.derby.impl.store.raw.RawStore.bootServiceModule(Unknown Source)
at org.apache.derby.impl.store.raw.RawStore.boot(Unknown Source)
at org.apache.derby.impl.services.monitor.BaseMonitor.boot(Unknown Source)
使用的版本:
Livy:livy-0.4.0-孵化
Spark:2.1.0
Hadoop:2.9.0
默认设置用于livy,我按照安装步骤指定SPARK_HOME和HADOOP_CONF_DIR。
此外,当我在pyspark shell中运行以下命令时,也会遇到相同的错误
from pyspark.ml import PipelineModel
path = "hdfs://master:54310/xxx/model"
model = PipelineModel.load(path)
没有任何Spark应用程序在运行,即使停止运行,也会发生这种情况。我需要进行哪些更改才能使代码正常工作?