我在创建SparkSession时遇到IllegalArgumentException

时间:2017-02-09 21:20:39

标签: apache-spark pyspark pyspark-sql

我在spark 2.1.0和python 2.7上使用pyspark和jupyter笔记本。我正在尝试使用以下代码创建新的SparkSession;

from pyspark import SparkContext
from pyspark import SparkConf
from pyspark.sql import SparkSession
from pyspark.sql import SQLContext

spark = SparkSession\
    .builder\
    .appName("Bank Service Classifier")\
    .config("spark.sql.crossJoin.enabled","true")\
    .getOrCreate()

sc = SparkContext()
sqlContext = SQLContext(sc)

但是,如果我收到以下错误;

IllegalArgumentException                  Traceback (most recent call last)
<ipython-input-40-2683a8d0ffcf> in <module>()
      4 from pyspark.sql import SQLContext
      5 
----> 6 spark = SparkSession    .builder    .appName("example-spark")    .config("spark.sql.crossJoin.enabled","true")    .getOrCreate()
      7 
      8 sc = SparkContext()

/srv/spark/python/pyspark/sql/session.py in getOrCreate(self)
    177                     session = SparkSession(sc)
    178                 for key, value in self._options.items():
--> 179                     session._jsparkSession.sessionState().conf().setConfString(key, value)
    180                 for key, value in self._options.items():
    181                     session.sparkContext._conf.set(key, value)

/srv/spark/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py in __call__(self, *args)
   1131         answer = self.gateway_client.send_command(command)
   1132         return_value = get_return_value(
-> 1133             answer, self.gateway_client, self.target_id, self.name)
   1134 
   1135         for temp_arg in temp_args:

/srv/spark/python/pyspark/sql/utils.py in deco(*a, **kw)
     77                 raise QueryExecutionException(s.split(': ', 1)[1], stackTrace)
     78             if s.startswith('java.lang.IllegalArgumentException: '):
---> 79                 raise IllegalArgumentException(s.split(': ', 1)[1], stackTrace)
     80             raise
     81     return deco

IllegalArgumentException: u"Error while instantiating 'org.apache.spark.sql.hive.HiveSessionState':"

我该如何解决这个问题?

1 个答案:

答案 0 :(得分:1)

我遇到了同样的错误。下载为Hadoop 2.6而不是2.7预先构建的Spark为我工作。