使用Spark 2.0.0,Oozie 4.2.0。我试图用oozie运行一个spark工作并得到了这个错误:
File "/mnt/yarn/usercache/hadoop/appcache/application_1473318730987_0107/container_1473318730987_0107_02_000001/pyspark.zip/pyspark/sql/context.py", line 481, in __init__
File "/mnt/yarn/usercache/hadoop/appcache/application_1473318730987_0107/container_1473318730987_0107_02_000001/pyspark.zip/pyspark/sql/session.py", line 177, in getOrCreate
File "/mnt/yarn/usercache/hadoop/appcache/application_1473318730987_0107/container_1473318730987_0107_02_000001/pyspark.zip/pyspark/sql/session.py", line 211, in __init__
TypeError: 'JavaPackage' object is not callable
pyspark/sqlsession.py正试图实例化sc._jvm.SparkSession
,但它不是一个类,所以它失败了。它与spark-submit
一起工作,所以我写了一个简单的脚本来看看有什么不同,get_session.py
:
#!/usr/bin/env python
from pyspark import SparkContext
sc = SparkContext()
print "sc._jvm.SparkSession:", sc._jvm.SparkSession
使用spark-submit
:
$ spark-submit --master yarn --deploy-mode cluster get_session.py
...
sc._jvm.SparkSession <py4j.java_gateway.JavaClass object at 0x7f7e8194f850>
...
从oozie工作流程调用时:
<workflow-app name="testing" xmlns="uri:oozie:workflow:0.4">
<start to="initSystem"/>
<action name="initSystem">
<spark xmlns="uri:oozie:spark-action:0.1">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<configuration>
<property>
<name>oozie.launcher.yarn.app.mapreduce.am.env</name>
<value>SPARK_HOME=/usr/lib/spark/</value>
</property>
<property>
<name>oozie.launcher.mapred.child.env</name>
<value>PYSPARK_ARCHIVES_PATH=pyspark.zip</value>
</property>
</configuration>
<master>yarn</master>
<mode>cluster</mode>
<name>testing</name>
<class></class>
<jar>${workflowPath}/get_session.py</jar>
<spark-opts>--py-files py4j-src.zip,pyspark.zip</spark-opts>
</spark>
<ok to="end"/>
<error to="end"/>
</action>
<end name="end"/>
</workflow-app>
输出结果为:
sc._jvm.SparkSession: <py4j.java_gateway.JavaPackage object at 0x7fc8eb1f8b50>
请注意sc._jvm.SparkSession
在第一种情况下是py4j.java_gateway.JavaClass
(没关系),但在第二种情况下是py4j.java_gateway.JavaPackage
;这不好,这是当该对象名称不可用时返回的通用对象。
有什么想法吗?所有这些都适用于Spark 1.6.0,但那里没有SparkSession
。