在针对Hive 2.1.0提交Spark 1.6.0 SQL应用程序时遇到错误:
Exception in thread "main" java.lang.NoSuchFieldError: HIVE_STATS_JDBC_TIMEOUT
at org.apache.spark.sql.hive.HiveContext.configure(HiveContext.scala:512)
at org.apache.spark.sql.hive.HiveContext.metadataHive$lzycompute(HiveContext.scala:252)
at org.apache.spark.sql.hive.HiveContext.metadataHive(HiveContext.scala:239)
at org.apache.spark.sql.hive.HiveContext.setConf(HiveContext.scala:443)
at org.apache.spark.sql.SQLContext$$anonfun$4.apply(SQLContext.scala:272)
at org.apache.spark.sql.SQLContext$$anonfun$4.apply(SQLContext.scala:271)
at scala.collection.Iterator$class.foreach(Iterator.scala:727)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
at org.apache.spark.sql.SQLContext.<init>(SQLContext.scala:271)
at org.apache.spark.sql.hive.HiveContext.<init>(HiveContext.scala:90)
at org.apache.spark.sql.hive.HiveContext.<init>(HiveContext.scala:101)
at my.package.AbstractProcess$class.prepareContexts(AbstractProcess.scala:33)
at my.package.PdfTextExtractor$.prepareContexts(PdfTextExtractor.scala:11)
at my.package.AbstractProcess$class.main(AbstractProcess.scala:21)
at my.package.PdfTextExtractor$.main(PdfTextExtractor.scala:11)
at my.package.PdfTextExtractor.main(PdfTextExtractor.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
我打电话时出现:
hiveContext.sql(sqlString)
我使用spark-submit工具提交申请:
appJar="${script_dir}/../../lib/application.jar"
jars="/usr/lib/spark/lib/spark-assembly.jar,/usr/lib/spark/lib/spark-examples.jar,/usr/share/java/scala-library.jar,/usr/lib/hive-exec.jar,/usr/lib/hive-2.1.0/lib/hive-metastore-2.1.0.jar,/usr/lib/hive-2.1.0/jdbc/hive-jdbc-2.1.0-standalone.jar,/usr/lib/hive-2.1.0/lib/hive-jdbc-2.1.0.jar"
CLASSPATH=`yarn classpath`
exec spark-submit --verbose \
--master 'yarn' \
--deploy-mode 'client' \
--name 'extract_text_from_krs_pdf' \
--jars ${jars} \
--executor-memory 3g \
--driver-cores 2 \
--driver-class-path '${CLASSPATH}:/usr/lib/spark/lib/spark-assembly.jar:/usr/lib/spark/lib/spark-examples.jar:/usr/share/java/scala-library.jar:/usr/lib/hive-exec.jar:/usr/lib/hive-2.1.0/lib/*:/usr/lib/hive-2.1.0/jdbc/*' \
--class 'my.package.PdfTextExtractor' \
"$appJar" "$dt" "$db"
我已按照Apache Spark Documentation: Interacting with Different Versions of Hive Metastore和Fixing Spark default metastore and Hive metastore mismatch issues 中的说明操作,因此我的/etc/spark/conf/spark-defaults.conf看起来像:
spark.sql.hive.metastore.version 2.1.0
spark.sql.hive.metastore.jars /etc/hadoop/conf:/etc/hadoop/conf:/etc/hadoop/conf:/usr/lib/hadoop/lib/*:/usr/lib/hadoop/.//*:/usr/lib/hadoop-hdfs/./:/usr/lib/hadoop-hdfs/lib/*:/usr/lib/hadoop-hdfs/.//*:/usr/lib/hadoop-yarn/lib/*:/usr/lib/hadoop-yarn/.//*:/usr/lib/hadoop-mapreduce/lib/*:/usr/lib/hadoop-mapreduce/.//*:/usr/lib/hadoop-yarn/.//*:/usr/lib/hadoop-yarn/lib/*:/usr/lib/hive-2.1.0/lib/*:/usr/lib/hive-2.1.0/jdbc/*:/usr/lib/spark/lib/*
但它根本没用。我真的没有想法了。