我正在通过以下代码使用pyspark将数据连接并导入到phoenix表中
dataframe.write.format(“ org.apache.phoenix.spark”)。mode(“ overwrite”)。option(“ table”,“ tablename”)。option(“ zkUrl”,“ localhost:2181”) .save()
当我在spark提交中运行此命令时,通过以下命令可以正常工作
spark-submit --master local --deploy-mode client --files /etc/hbase/conf/hbase-site.xml --conf "spark.executor.extraClassPath=/usr/hdp/current/phoenix-client/lib/phoenix-spark-4.7.0.2.6.3.0-235.jar:/usr/hdp/current/phoenix-client/phoenix-4.7.0.2.6.3.0-235-client.jar" --conf "spark.driver.extraClassPath=/usr/hdp/current/phoenix-client/lib/phoenix-spark-4.7.0.2.6.3.0-235.jar:/usr/hdp/current/phoenix-client/phoenix-4.7.0.2.6.3.0-235-client.jar" sparkPhoenix.py
当我与oozie一起运行时,出现错误,
.ConnectionClosingException:与ip-172-31-44-101.us-west-2.compute.internal / 172.31.44.101:16020的连接正在关闭。在region = hbase:meta,1.1588230740,hostname = ip-172-31-44-101
上的表'hbase:meta'上调用id = 9,waitTime = 3行'SYSTEM:CATALOG,'下面是工作流程,
<action name="pysparkAction" retry-max="1" retry-interval="1" cred="hbase">
<spark
xmlns="uri:oozie:spark-action:0.2">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<master>local</master>
<mode>client</mode>
<name>Spark Example</name>
<jar>sparkPhoenix.py</jar>
<spark-opts>--py-files Leia.zip --files /etc/hbase/conf/hbase-site.xml --conf spark.executor.extraClassPath=/usr/hdp/current/phoenix-client/lib/phoenix-spark-4.7.0.2.6.3.0-235.jar:/usr/hdp/current/phoenix-client/phoenix-4.7.0.2.6.3.0-235-client.jar --conf spark.driver.extraClassPath=/usr/hdp/current/phoenix-client/lib/phoenix-spark-4.7.0.2.6.3.0-235.jar:/usr/hdp/current/phoenix-client/phoenix-4.7.0.2.6.3.0-235-client.jar</spark-opts>
</spark>
<ok to="successEmailaction"/>
<error to="failEmailaction"/>
</action>
使用spark-submit,我得到了相同的错误,并通过传递所需的jar来纠正。在oozie中,即使我通过了罐子,也会引发错误。
答案 0 :(得分:0)
我发现与oozie集成时,“-files /etc/hbase/conf/hbase-site.xml”不起作用。我在oozie spark操作中通过文件标签通过hbase-site.xml如下。现在工作正常
<file>file:///etc/hbase/conf/hbase-site.xml</file>