java.util.NoSuchElementException:找不到密钥:_PYSPARK_DRIVER_CALLBACK_HOST

时间:2018-10-09 15:51:24

标签: python-3.x apache-spark pyspark

我正在使用PySpark和spark-redshift驱动程序将数据加载到redshift表中。因为我读到'com.databricks.spark.redshift'在Spark 2.3.1中不起作用,所以我在使用Spark 2.1.0和Python 3.5.6

Python 3.5.6 (default, Sep 26 2018, 21:49:11)
[GCC 7.3.1 20180303 (Red Hat 7.3.1-5)] on linux
Type "help", "copyright", "credits" or "license" for more information.
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
18/10/09 15:41:25 WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0
18/10/09 15:41:25 WARN ObjectStore: Failed to get database default, returning NoSuchObjectException
18/10/09 15:41:26 WARN ObjectStore: Failed to get database global_temp, returning NoSuchObjectException
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /__ / .__/\_,_/_/ /_/\_\   version 2.1.0
      /_/

Using Python version 3.5.6 (default, Sep 26 2018 21:49:11)

当我尝试通过脚本创建SparkContext

sc = SparkContext(conf=conf)

我收到以下错误

Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
18/10/09 15:34:46 ERROR SparkUncaughtExceptionHandler: Uncaught exception in thread Thread[main,5,main]
java.util.NoSuchElementException: key not found: _PYSPARK_DRIVER_CALLBACK_HOST
    at scala.collection.MapLike$class.default(MapLike.scala:228)
    at scala.collection.AbstractMap.default(Map.scala:59)
    at scala.collection.MapLike$class.apply(MapLike.scala:141)
    at scala.collection.AbstractMap.apply(Map.scala:59)
    at org.apache.spark.api.python.PythonGatewayServer$$anonfun$main$1.apply$mcV$sp(PythonGatewayServer.scala:50)
    at org.apache.spark.util.Utils$.tryOrExit(Utils.scala:1228)
    at org.apache.spark.api.python.PythonGatewayServer$.main(PythonGatewayServer.scala:37)
    at org.apache.spark.api.python.PythonGatewayServer.main(PythonGatewayServer.scala)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:738)
    at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187)
    at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

阅读此post,听起来好像错误与版本不匹配有关。我不相信我的版本不匹配:

[ec2-user@ip-172-31-50-110 ~]$ echo $SPARK_HOME
/opt/spark-2.1.0-bin-hadoop2.7
[ec2-user@ip-172-31-50-110 ~]$ echo $PYSPARK_PYTHON
/usr/bin/python3.5
[ec2-user@ip-172-31-50-110 ~]$ echo $PYTHONPATH
/usr/bin/python3.5

我不确定在诊断问题时从何而来。非常感谢这里的任何帮助!

编辑:spark-shell --version

的输出
[ec2-user@ip-172-31-50-110 ~]$ spark-shell --version
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 2.1.0
      /_/

Using Scala version 2.11.8, Java HotSpot(TM) 64-Bit Server VM, 1.8.0_181
Branch
Compiled by user jenkins on 2016-12-16T02:04:48Z
Revision
Url
Type --help for more information.

并从$ PYSPARK_PYTHON -c输出“ import pyspark; print(pyspark。版本)”

[ec2-user@ip-172-31-50-110 ~]$ $PYSPARK_PYTHON -c "import pyspark; print(pyspark.__version__)"
2.3.2

0 个答案:

没有答案