Sparksession错误与蜂巢有关

时间:2018-03-05 09:00:11

标签: hadoop apache-spark hive pyspark spark-dataframe

我的操作系统是Windows 10

from pyspark.conf import SparkConf
sc = SparkContext.getOrCreate()
spark = SparkSession.builder.enableHiveSupport().getOrCreate()

此代码给出了以下错误

  

Py4JJavaError Traceback(最近一次调用   持续)   〜\文档\火花\火花2.1.0彬hadoop2.7 \ python的\ pyspark \ SQL \ utils.py   装饰(* a,** kw)        62尝试:   ---> 63返回f(* a,** kw)        64除了py4j.protocol.Py4JJavaError为e:

     

〜\文档\火花\火花2.1.0彬hadoop2.7 \蟒\ lib中\ py4j-0.10.4-src.zip \ py4j \ protocol.py   在get_return_value中(answer,gateway_client,target_id,name)       318"调用{0} {1} {2}时发生错误。\ n"。    - > 319格式(target_id,"。",名称),值)       320其他:

     

Py4JJavaError:调用o22.sessionState时发生错误。 :   java.lang.IllegalArgumentException:实例化时出错   ' org.apache.spark.sql.hive.HiveSessionState':at   org.apache.spark.sql.SparkSession $ .ORG $阿帕奇$火花$ SQL $ SparkSession $$反映(SparkSession.scala:981)     在   org.apache.spark.sql.SparkSession.sessionState $ lzycompute(SparkSession.scala:110)     在   org.apache.spark.sql.SparkSession.sessionState(SparkSession.scala:109)     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)at   sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)     在   sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)     在java.lang.reflect.Method.invoke(Method.java:606)at   py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)at at   py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)at   py4j.Gateway.invoke(Gateway.java:280)at   py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)     在py4j.commands.CallCommand.execute(CallCommand.java:79)at   py4j.GatewayConnection.run(GatewayConnection.java:214)at   java.lang.Thread.run(Thread.java:745)引起:   java.lang.reflect.InvocationTargetException at   sun.reflect.NativeConstructorAccessorImpl.newInstance0(本机方法)     在   sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)     在   sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)     at java.lang.reflect.Constructor.newInstance(Constructor.java:526)     在   org.apache.spark.sql.SparkSession $ .ORG $阿帕奇$火花$ SQL $ SparkSession $$反映(SparkSession.scala:978)     ... 13更多引起:java.lang.IllegalArgumentException:错误   实例化' org.apache.spark.sql.hive.HiveExternalCatalog':     在   org.apache.spark.sql.internal.SharedState $ .ORG $阿帕奇$火花$ SQL $内部$对sharedState $$反映(SharedState.scala:169)     在   。org.apache.spark.sql.internal.SharedState(SharedState.scala:86)     在   org.apache.spark.sql.SparkSession $$ anonfun $ $对sharedState 1.适用(SparkSession.scala:101)     在   org.apache.spark.sql.SparkSession $$ anonfun $ $对sharedState 1.适用(SparkSession.scala:101)     在scala.Option.getOrElse(Option.scala:121)at   org.apache.spark.sql.SparkSession.sharedState $ lzycompute(SparkSession.scala:101)     在   org.apache.spark.sql.SparkSession.sharedState(SparkSession.scala:100)     在   。org.apache.spark.sql.internal.SessionState(SessionState.scala:157)     在   。org.apache.spark.sql.hive.HiveSessionState(HiveSessionState.scala:32)     ... 18更多引起:java.lang.reflect.InvocationTargetException     at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native   方法)at   sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)     在   sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)     at java.lang.reflect.Constructor.newInstance(Constructor.java:526)     在   org.apache.spark.sql.internal.SharedState $ .ORG $阿帕奇$火花$ SQL $内部$对sharedState $$反映(SharedState.scala:166)     ... 26更多引起:java.lang.reflect.InvocationTargetException     at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native   方法)at   sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)     在   sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)     at java.lang.reflect.Constructor.newInstance(Constructor.java:526)     在   org.apache.spark.sql.hive.client.IsolatedClientLoader.createClient(IsolatedClientLoader.scala:264)     在   org.apache.spark.sql.hive.HiveUtils $ .newClientForMetadata(HiveUtils.scala:366)     在   org.apache.spark.sql.hive.HiveUtils $ .newClientForMetadata(HiveUtils.scala:270)     在   。org.apache.spark.sql.hive.HiveExternalCatalog(HiveExternalCatalog.scala:65)     ... 31更多引起:java.lang.RuntimeException:   java.lang.RuntimeException:无法实例化   org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient at   org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:522)     在   。org.apache.spark.sql.hive.client.HiveClientImpl(HiveClientImpl.scala:192)     ... 39更多引起:java.lang.RuntimeException:无法   实例   org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient

我的完整代码在这里

from pyspark.sql import SQLContext 
from pyspark.sql import SparkSession 
import findspark 
findspark.init('C:/Users/asus/Documents/spark/spark-2.1.0-bin-hadoop2.7') 
import pyspark from pyspark.conf 
import SparkConf sc = SparkContext.getOrCreate() 
spark = SparkSession.builder.enableHiveSupport().getOrCreate()

1 个答案:

答案 0 :(得分:1)

从您发布的代码中看来,您似乎是Java开发人员,或者您可能急于粘贴代码。在python中,你不像我们在Java中那样用类型编写变量 SparkContext sc = SparkContext.getOrCreate()

此外,从Spark版本2.0+开始,您需要创建一个SparkSession对象,它是应用程序的入口点。您从此对象本身派生SparkContext。尝试创建另一个SparkContext “sc = SparkContext.getOrCreate()”会导致错误。这是因为在设计中,只有一个SparkContext可以在给定的单个JVM中运行。如果需要新的Context,则需要使用 sc.stop()停止以前创建的SparkContext。

从您的堆栈跟踪和代码中说出来我也认为您在本地测试您的应用程序并且没有在本地计算机上安装Hadoop和Hive,这会给您带来错误:

  

引起:java.lang.RuntimeException:java.lang.RuntimeException:   无法实例化   org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient at   org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:522)   在......

您可以在Windows计算机上安装Hadoop和Hive,并尝试以下代码段。

from pyspark.sql import SparkSession

spark = SparkSession \
.builder \
.appName('CalculatingGeoDistances') \
.enableHiveSupport() \
.getOrCreate()

sc = spark.sparkContext