我按照Kubernetes' Spark example上的说明进行操作。我可以启动PySpark shell。但是,我需要使用PySpark和JDBC连接到我的Postgres数据库。在我尝试使用Kubernetes之前,我使用spark-defaults.conf
文件让JDBC使用Spark:
spark.driver.extraClassPath /spark/postgresql-9.4.1209.jre7.jar
spark.executor.extraClassPath /spark/postgresql-9.4.1209.jre7.jar
我还必须先将驱动程序下载到该位置。我如何用Kubernetes实现同样的目标?我不认为我能做到
kubectl exec zeppelin-controller-xzlrf -it pyspark --jars /spark/postgresql-9.4.1209.jre7.jar
因为jar首先必须在容器内。因此,如果我可以在容器中获取jar文件,也许我可以使它工作,但我该怎么做?非常感谢任何想法或帮助。
更新:我尝试了@ LostInOverflow的解决方案,但遇到了以下情况:
kubectl exec zeppelin-controller-2p3ew -it -- pyspark --packages org.postgresql:postgresql:9.4.1209.jre7.jar
似乎启动并识别包参数但仍然失败:
Python 2.7.9 (default, Mar 1 2015, 12:57:24)
[GCC 4.9.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
Ivy Default Cache set to: /root/.ivy2/cache
The jars for the packages stored in: /root/.ivy2/jars
:: loading settings :: url = jar:file:/opt/spark-1.5.2-bin-hadoop2.6/lib/spark-assembly-1.5.2-hadoop2.6.0.jar!/org/apache/ivy/core/settings/ivysettings.xml
org.postgresql#postgresql added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent;1.0
confs: [default]
:: resolution report :: resolve 2294ms :: artifacts dl 0ms
:: modules in use:
---------------------------------------------------------------------
| | modules || artifacts |
| conf | number| search|dwnlded|evicted|| number|dwnlded|
---------------------------------------------------------------------
| default | 1 | 0 | 0 | 0 || 0 | 0 |
---------------------------------------------------------------------
:: problems summary ::
:::: WARNINGS
module not found: org.postgresql#postgresql;9.4.1209.jre7.jar
==== local-m2-cache: tried
file:/root/.m2/repository/org/postgresql/postgresql/9.4.1209.jre7.jar/postgresql-9.4.1209.jre7.jar.pom
-- artifact org.postgresql#postgresql;9.4.1209.jre7.jar!postgresql.jar:
file:/root/.m2/repository/org/postgresql/postgresql/9.4.1209.jre7.jar/postgresql-9.4.1209.jre7.jar.jar
==== local-ivy-cache: tried
/root/.ivy2/local/org.postgresql/postgresql/9.4.1209.jre7.jar/ivys/ivy.xml
==== central: tried
https://repo1.maven.org/maven2/org/postgresql/postgresql/9.4.1209.jre7.jar/postgresql-9.4.1209.jre7.jar.pom
-- artifact org.postgresql#postgresql;9.4.1209.jre7.jar!postgresql.jar:
https://repo1.maven.org/maven2/org/postgresql/postgresql/9.4.1209.jre7.jar/postgresql-9.4.1209.jre7.jar.jar
==== spark-packages: tried
http://dl.bintray.com/spark-packages/maven/org/postgresql/postgresql/9.4.1209.jre7.jar/postgresql-9.4.1209.jre7.jar.pom
-- artifact org.postgresql#postgresql;9.4.1209.jre7.jar!postgresql.jar:
http://dl.bintray.com/spark-packages/maven/org/postgresql/postgresql/9.4.1209.jre7.jar/postgresql-9.4.1209.jre7.jar.jar
::::::::::::::::::::::::::::::::::::::::::::::
:: UNRESOLVED DEPENDENCIES ::
::::::::::::::::::::::::::::::::::::::::::::::
:: org.postgresql#postgresql;9.4.1209.jre7.jar: not found
::::::::::::::::::::::::::::::::::::::::::::::
:: USE VERBOSE OR DEBUG MESSAGE LEVEL FOR MORE DETAILS
Exception in thread "main" java.lang.RuntimeException: [unresolved dependency: org.postgresql#postgresql;9.4.1209.jre7.jar: not found]
at org.apache.spark.deploy.SparkSubmitUtils$.resolveMavenCoordinates(SparkSubmit.scala:1011)
at org.apache.spark.deploy.SparkSubmit$.prepareSubmitEnvironment(SparkSubmit.scala:286)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:153)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:120)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Traceback (most recent call last):
File "/opt/spark/python/pyspark/shell.py", line 43, in <module>
sc = SparkContext(pyFiles=add_files)
File "/opt/spark/python/pyspark/context.py", line 110, in __init__
SparkContext._ensure_initialized(self, gateway=gateway)
File "/opt/spark/python/pyspark/context.py", line 234, in _ensure_initialized
SparkContext._gateway = gateway or launch_gateway()
File "/opt/spark/python/pyspark/java_gateway.py", line 94, in launch_gateway
raise Exception("Java gateway process exited before sending the driver its port number")
Exception: Java gateway process exited before sending the driver its port number
>>>
答案 0 :(得分:0)
您可以--packages
使用坐标代替--jars
:
--packages org.postgresql:postgresql:9.4.1209.jre7.jar