使用Hive表时,Spark提交会引发错误

时间:2016-05-20 11:05:35

标签: apache-spark hive spark-dataframe

我有一个奇怪的错误,我正在尝试将数据写入hive,它在spark-shell中运行良好,但是当我使用spark-submit时,它会抛出未找到默认错误的数据库/表。

以下是我试图在spark-submit中编写的编码,我正在使用spark 2.0.0的自定义构建

 val sqlContext = new org.apache.spark.sql.SQLContext(sc)
sqlContext.table("spark_schema.iris_ori")

以下是我正在使用的命令,

/home/ec2-user/Spark_Source_Code/spark/bin/spark-submit --class TreeClassifiersModels --master local[*] /home/ec2-user/Spark_Snapshots/Spark_2.6/TreeClassifiersModels/target/scala-2.11/treeclassifiersmodels_2.11-1.0.3.jar /user/ec2-user/Input_Files/defPath/iris_spark SPECIES~LBL+PETAL_LENGTH+PETAL_WIDTH RAN_FOREST 0.7 123 12

以下是错误,

16/05/20 09:05:18 INFO SparkSqlParser:解析命令:spark_schema.measures_20160520090502 线程“main”中的异常org.apache.spark.sql.AnalysisException:数据库'spark_schema'不存在;         在org.apache.spark.sql.catalyst.catalog.ExternalCatalog.requireDbExists(ExternalCatalog.scala:37)         at org.apache.spark.sql.catalyst.catalog.InMemoryCatalog.tableExists(InMemoryCatalog.scala:195)         在org.apache.spark.sql.catalyst.catalog.SessionCatalog.tableExists(SessionCatalog.scala:360)         在org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:464)         在org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:458)         在TreeClassifiersModels $ .main(TreeClassifiersModels.scala:71)         在TreeClassifiersModels.main(TreeClassifiersModels.scala)         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)         at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)         在java.lang.reflect.Method.invoke(Method.java:497)         在org.apache.spark.deploy.SparkSubmit $ .org $ apache $ spark $ deploy $ SparkSubmit $$ runMain(SparkSubmit.scala:726)         在org.apache.spark.deploy.SparkSubmit $ .doRunMain $ 1(SparkSubmit.scala:183)         在org.apache.spark.deploy.SparkSubmit $ .submit(SparkSubmit.scala:208)         在org.apache.spark.deploy.SparkSubmit $ .main(SparkSubmit.scala:122)         在org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

1 个答案:

答案 0 :(得分:2)

问题是因为Spark Version 2.0.0上发生了弃用。 Hive Context在Spark 2.0.0中已弃用。要在Spark 2.0.0上读/写Hive表,我们需要使用Spark会话,如下所示。

val sparkSession = SparkSession.withHiveSupport(sc)