无法使用Livy的python-api客户端和pysaprk shell加载pyspark.ml模型

时间:2018-08-22 13:03:04

标签: apache-spark pyspark hdfs pyspark-sql livy

我正在尝试使用具有以下功能的livy的python-api客户端(https://github.com/cloudera/livy/tree/master/python-api)加载pyspark.ml模型:

def load_model(context):

    from pyspark.ml import PipelineModel
    path = "hdfs://master:54310/xxx/model"
    model = PipelineModel.load(path)

    """
    Use the model for predictions after loading
    """

    return 

我创建一个HttpClient,然后将上述代码提交给客户端,如下所示:

client = HttpClient("http://localhost:8998")
a = client.submit(load_model).result()
client.stop(True)

运行上面的代码时,出现以下错误:

Traceback (most recent call last):
      File "test.py", line 20, in <module>
        a = client.submit(load_model).result()
      File "/usr/local/lib/python2.7/dist-packages/concurrent/futures/_base.py", line 429, in result
        return self.__get_result()
      File "/usr/local/lib/python2.7/dist-packages/concurrent/futures/_base.py", line 381, in __get_result
        raise exception_type, self._exception, self._traceback
    Exception: org.apache.livy.repl.PythonJobException: Client job error:Traceback (most recent call last):
      File "/tmp/5744958999022937777", line 160, in processBypassJob
        result = deserialized_job(job_context)
      File "test.py", line 12, in load_model
      File "/home/hduser/spark/python/pyspark/ml/util.py", line 252, in load
        return cls.read().load(path)
      File "/home/hduser/spark/python/pyspark/ml/util.py", line 193, in load
        java_obj = self._jread.load(path)
      File "/home/hduser/spark/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 1133, in __call__
        answer, self.gateway_client, self.target_id, self.name)
      File "/home/hduser/spark/python/pyspark/sql/utils.py", line 79, in deco
        raise IllegalArgumentException(s.split(': ', 1)[1], stackTrace)
    IllegalArgumentException: u"Error while instantiating 'org.apache.spark.sql.hive.HiveSessionState':"

当我检查livy日志时,发现以下错误:

Caused by: ERROR XSDB6: Another instance of Derby may have already booted the database /home/hduser/livy-0.4.0-incubating-bin/bin/metastore_db.
    at org.apache.derby.iapi.error.StandardException.newException(Unknown Source)
    at org.apache.derby.iapi.error.StandardException.newException(Unknown Source)
    at org.apache.derby.impl.store.raw.data.BaseDataFileFactory.privGetJBMSLockOnDB(Unknown Source)
    at org.apache.derby.impl.store.raw.data.BaseDataFileFactory.run(Unknown Source)
    at java.security.AccessController.doPrivileged(Native Method)
    at org.apache.derby.impl.store.raw.data.BaseDataFileFactory.getJBMSLockOnDB(Unknown Source)
    at org.apache.derby.impl.store.raw.data.BaseDataFileFactory.boot(Unknown Source)
    at org.apache.derby.impl.services.monitor.BaseMonitor.boot(Unknown Source)
    at org.apache.derby.impl.services.monitor.TopService.bootModule(Unknown Source)
    at org.apache.derby.impl.services.monitor.BaseMonitor.startModule(Unknown Source)
    at org.apache.derby.impl.services.monitor.FileMonitor.startModule(Unknown Source)
    at org.apache.derby.iapi.services.monitor.Monitor.bootServiceModule(Unknown Source)
    at org.apache.derby.impl.store.raw.RawStore$6.run(Unknown Source)
    at java.security.AccessController.doPrivileged(Native Method)
    at org.apache.derby.impl.store.raw.RawStore.bootServiceModule(Unknown Source)
    at org.apache.derby.impl.store.raw.RawStore.boot(Unknown Source)
    at org.apache.derby.impl.services.monitor.BaseMonitor.boot(Unknown Source)

使用的版本:
Livy:livy-0.4.0-孵化
Spark:2.1.0
Hadoop:2.9.0

默认设置用于livy,我按照安装步骤指定SPARK_HOME和HADOOP_CONF_DIR。

此外,当我在pyspark shell中运行以下命令时,也会遇到相同的错误

from pyspark.ml import PipelineModel
path = "hdfs://master:54310/xxx/model"
model = PipelineModel.load(path)

没有任何Spark应用程序在运行,即使停止运行,也会发生这种情况。我需要进行哪些更改才能使代码正常工作?

0 个答案:

没有答案