如何在Kubernetes-Spark中加载诸如JDBC之类的jar包

时间:2016-09-09 20:40:43

标签: postgresql jdbc apache-spark pyspark kubernetes

我按照Kubernetes' Spark example上的说明进行操作。我可以启动PySpark shell。但是,我需要使用PySpark和JDBC连接到我的Postgres数据库。在我尝试使用Kubernetes之前,我使用spark-defaults.conf文件让JDBC使用Spark:

spark.driver.extraClassPath /spark/postgresql-9.4.1209.jre7.jar
spark.executor.extraClassPath /spark/postgresql-9.4.1209.jre7.jar

我还必须先将驱动程序下载到该位置。我如何用Kubernetes实现同样的目标?我不认为我能做到

kubectl exec zeppelin-controller-xzlrf -it pyspark --jars /spark/postgresql-9.4.1209.jre7.jar

因为jar首先必须在容器内。因此,如果我可以在容器中获取jar文件,也许我可以使它工作,但我该怎么做?非常感谢任何想法或帮助。

更新:我尝试了@ LostInOverflow的解决方案,但遇到了以下情况:

kubectl exec zeppelin-controller-2p3ew -it -- pyspark --packages org.postgresql:postgresql:9.4.1209.jre7.jar

似乎启动并识别包参数但仍然失败:

Python 2.7.9 (default, Mar  1 2015, 12:57:24) 
[GCC 4.9.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
Ivy Default Cache set to: /root/.ivy2/cache
The jars for the packages stored in: /root/.ivy2/jars
:: loading settings :: url = jar:file:/opt/spark-1.5.2-bin-hadoop2.6/lib/spark-assembly-1.5.2-hadoop2.6.0.jar!/org/apache/ivy/core/settings/ivysettings.xml
org.postgresql#postgresql added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent;1.0
    confs: [default]
:: resolution report :: resolve 2294ms :: artifacts dl 0ms
    :: modules in use:
    ---------------------------------------------------------------------
    |                  |            modules            ||   artifacts   |
    |       conf       | number| search|dwnlded|evicted|| number|dwnlded|
    ---------------------------------------------------------------------
    |      default     |   1   |   0   |   0   |   0   ||   0   |   0   |
    ---------------------------------------------------------------------

:: problems summary ::
:::: WARNINGS
        module not found: org.postgresql#postgresql;9.4.1209.jre7.jar

    ==== local-m2-cache: tried

      file:/root/.m2/repository/org/postgresql/postgresql/9.4.1209.jre7.jar/postgresql-9.4.1209.jre7.jar.pom

      -- artifact org.postgresql#postgresql;9.4.1209.jre7.jar!postgresql.jar:

      file:/root/.m2/repository/org/postgresql/postgresql/9.4.1209.jre7.jar/postgresql-9.4.1209.jre7.jar.jar

    ==== local-ivy-cache: tried

      /root/.ivy2/local/org.postgresql/postgresql/9.4.1209.jre7.jar/ivys/ivy.xml

    ==== central: tried

      https://repo1.maven.org/maven2/org/postgresql/postgresql/9.4.1209.jre7.jar/postgresql-9.4.1209.jre7.jar.pom

      -- artifact org.postgresql#postgresql;9.4.1209.jre7.jar!postgresql.jar:

      https://repo1.maven.org/maven2/org/postgresql/postgresql/9.4.1209.jre7.jar/postgresql-9.4.1209.jre7.jar.jar

    ==== spark-packages: tried

      http://dl.bintray.com/spark-packages/maven/org/postgresql/postgresql/9.4.1209.jre7.jar/postgresql-9.4.1209.jre7.jar.pom

      -- artifact org.postgresql#postgresql;9.4.1209.jre7.jar!postgresql.jar:

      http://dl.bintray.com/spark-packages/maven/org/postgresql/postgresql/9.4.1209.jre7.jar/postgresql-9.4.1209.jre7.jar.jar

        ::::::::::::::::::::::::::::::::::::::::::::::

        ::          UNRESOLVED DEPENDENCIES         ::

        ::::::::::::::::::::::::::::::::::::::::::::::

        :: org.postgresql#postgresql;9.4.1209.jre7.jar: not found

        ::::::::::::::::::::::::::::::::::::::::::::::



:: USE VERBOSE OR DEBUG MESSAGE LEVEL FOR MORE DETAILS
Exception in thread "main" java.lang.RuntimeException: [unresolved dependency: org.postgresql#postgresql;9.4.1209.jre7.jar: not found]
    at org.apache.spark.deploy.SparkSubmitUtils$.resolveMavenCoordinates(SparkSubmit.scala:1011)
    at org.apache.spark.deploy.SparkSubmit$.prepareSubmitEnvironment(SparkSubmit.scala:286)
    at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:153)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:120)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Traceback (most recent call last):
  File "/opt/spark/python/pyspark/shell.py", line 43, in <module>
    sc = SparkContext(pyFiles=add_files)
  File "/opt/spark/python/pyspark/context.py", line 110, in __init__
    SparkContext._ensure_initialized(self, gateway=gateway)
  File "/opt/spark/python/pyspark/context.py", line 234, in _ensure_initialized
    SparkContext._gateway = gateway or launch_gateway()
  File "/opt/spark/python/pyspark/java_gateway.py", line 94, in launch_gateway
    raise Exception("Java gateway process exited before sending the driver its port number")
Exception: Java gateway process exited before sending the driver its port number
>>> 

1 个答案:

答案 0 :(得分:0)

您可以--packages使用坐标代替--jars

--packages org.postgresql:postgresql:9.4.1209.jre7.jar