初始化sparkContext python时出奇怪的错误

时间:2017-06-13 11:33:35

标签: python apache-spark pyspark

我一直在使用spark 2.0.1,但尝试通过将tar文件下载到我的本地并更改PATHS来升级到更新的版本,即2.1.1。

但是,现在当我尝试运行任何程序时,它在sparkContext初始化时失败了。即

    sc = SparkContext()

我尝试运行的整个示例代码是:

     import os
     os.environ['SPARK_HOME']="/opt/apps/spark-2.1.1-bin-hadoop2.7/"

     from pyspark import SparkContext
     from pyspark.sql import *
     sc = SparkContext()

     sqlContext = SQLContext(sc)

     df_tract_alpha= sqlContext.read.parquet("tract_alpha.parquet")
     print (df_tract_alpha.count())

我得到的例外是开始本身,即:


    Traceback (most recent call last):
      File "/home/vna/scripts/global_score_pipeline/test_code_here.py", line 47, in 
        sc = SparkContext()
      File "/opt/apps/spark-2.1.1-bin-hadoop2.7/python/pyspark/context.py", line 118, in __init__
        conf, jsc, profiler_cls)
      File "/opt/apps/spark-2.1.1-bin-hadoop2.7/python/pyspark/context.py", line 182, in _do_init
        self._jsc = jsc or self._initialize_context(self._conf._jconf)
      File "/opt/apps/spark-2.1.1-bin-hadoop2.7/python/pyspark/context.py", line 249, in _initialize_context
        return self._jvm.JavaSparkContext(jconf)
      File "/opt/apps/spark-2.1.1-bin-hadoop2.7/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 1401, in __call__
      File "/opt/apps/spark-2.1.1-bin-hadoop2.7/python/lib/py4j-0.10.4-src.zip/py4j/protocol.py", line 319, in get_return_value
    py4j.protocol.Py4JJavaError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext.
    : java.lang.NumberFormatException: For input string: "Ubuntu"
        at java.base/java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)

我没有在我的变量或ENV变量中传递Ubuntu ..

我也尝试过改变sc = SparkContext(master ='local'),问题仍然是一样。

请帮助识别此问题

编辑:spark-defaults.conf的内容


    spark.master                     spark://master:7077
    # spark.eventLog.enabled           true
    # spark.eventLog.dir               hdfs://namenode:8021/directory
    spark.serializer                 org.apache.spark.serializer.KryoSerializer
    spark.driver.memory              8g
    spark.executor.extraJavaOptions  -XX:+PrintGCDetails -Dkey=value -Dnumbers="one two three"
    spark.driver.extraClassPath /opt/apps/spark-2.1.1-bin-hadoop2.7/jars/mysql-connector-java-5.1.35-bin.jar
    spark.executor.extraClassPath /opt/apps/spark-2.1.1-bin-hadoop2.7/jars/mysql-connector-java-5.1.35-bin.jar

1 个答案:

答案 0 :(得分:0)

您是否检查了配置文件(例如spark-defaults.conf)?对于期望整数的字段,它可能是解析错误。例如,如果您尝试设置spark.executor.cores Ubuntu,则可以获得该异常。