当我运行像
这样的简单查询时,我遇到了关于Spark的Hive问题select * from table_name
在hive控制台上每件事都运作良好,但是当我执行
时select count(*) from table_name
查询以以下内容终止 错误:
Query ID = ab_20160515134700_795fc14c-e89b-4172-bcc6-0cfcffadcd88
Total jobs = 1
Launching Job 1 out of 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Starting Spark Job = d5e1856e-de67-4e2d-a914-ca1aae324b7f
Status: SENT
Failed to execute spark task, with exception 'java.lang.IllegalStateException(RPC channel is closed.)'
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.spark.SparkTask
版本:
hadoop-2.7.2
apache-hive-2.0.0
spark-1.6.0-bin-hadoop2
scala: 2.11.8
我已经设定: hive-site.xml中的spark.master 现在我得到:java.util.concurrent.ExecutionException:java.lang.RuntimeException:取消客户端'8ffe7ea3-aaf4-456c-ae18-23c572a766c5'。错误:子进程在org之前连接回io.netty.util.concurrent.AbstractFuture.get(AbstractFuture.java:37)〜[netty-all-4.0.23.Final.jar:4.0.23.Final]之前退出。 apache.hive.spark.client.SparkClientImpl。(SparkClientImpl.java:101)[hive-exec-2.0.0.jar:2.0.0] at org.apache.hive.spark.client.SparkClientFactory.createClient(SparkClientFactory.java) :80)[hive-exec-2.0.0.jar:2.0.0] org.apache.hadoop.hive.ql.exec.spark.RemoteHiveSparkClient.createRemoteClient(RemoteHiveSparkClient.java:98)[hive-exec-2.0。 0.jar:2.0.0] org.apache.hadoop.hive.ql.exec.spark.RemoteHiveSparkClient。(RemoteHiveSparkClient.java:94)[hive-exec-2.0.0.jar:2.0.0] at org。 org.apache.hadoop.hive.ql.exec上的apache.hadoop.hive.ql.exec.spark.HiveSparkClientFactory.createHiveSparkClient(HiveSparkClientFactory.java:63)[hive-exec-2.0.0.jar:2.0.0]。 spark.session.SparkSessionImpl.open(SparkSessionImpl.java:55)[hive-exec-2.0.0.jar:2.0.0] at org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionManagerImpl.getSessio n(SparkSessionManagerImpl.java:114)[hive-exec-2.0.0.jar:2.0.0]在org.apache.hadoop.hive.ql.exec.spark.SparkUtilities.getSparkSession(SparkUtilities.java:131)[hive -exec-2.0.0.jar:2.0.0]在org.apache.hadoop.hive.ql.exec.spark.SparkTask.execute(SparkTask.java:106)[hive-exec-2.0.0.jar:2.0 .0]在org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:158)[hive-exec-2.0.0.jar:2.0.0] org.apache.hadoop.hive。 ql.exec.TaskRunner.runSequential(TaskRunner.java:101)[hive-exec-2.0.0.jar:2.0.0] at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1840) [hive-exec-2.0.0.jar:2.0.0] at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1584)[hive-exec-2.0.0.jar:2.0.0 ] org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1361)[hive-exec-2.0.0.jar:2.0.0] org.apache.hadoop.hive.ql.Driver。运行(Driver.java:1184)[hive-exec-2.0.0.jar:2.0.0] org.apache.hadoop.hive.ql.Driver.run(Driver.java:1172)[hive-exec-2.0] .0.jar:2.0.0]在org.apache.hadoop.hive.cli.CliDriver.processLocalCmd (CliDriver.java:233)[hive-cli-2.0.0.jar:2.0.0] org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:184)[hive-cli-2.0。 0.jar:2.0.0] org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:400)[hive-cli-2.0.0.jar:2.0.0] org.apache.hadoop .hive.cli.CliDriver.executeDriver(CliDriver.java:778)[hive-cli-2.0.0.jar:2.0.0] at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:717 )[hive-cli-2.0.0.jar:2.0.0] org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:645)[hive-cli-2.0.0.jar:2.0。 0] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)〜[?:1.8.0_77] at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)〜[?:1.8.0_77] at sun.reflect。在org.apache.hadoop的java.lang.reflect.Method.invoke(Method.java:498)〜[?:1.8.0_77]中委托MethodAethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)〜[?:1.8.0_77]。在org.apache.hadoop.util.RunJa上的util.RunJar.run(RunJar.java:221)[spark-assembly-1.6.0-hadoop2.6.0.jar:1.6.0] r.main(RunJar.java:136)[spark-assembly-1.6.0-hadoop2.6.0.jar:1.6.0]引起:java.lang.RuntimeException:取消客户端'8ffe7ea3-aaf4-456c-ae18-23c572a766c5 ”。错误:在org中连接回org.apache.hive.spark.client.rpc.RpcServer.cancelClient(RpcServer.java:180)〜[hive-exec-2.0.0.jar:2.0.0]之前退出子进程。 java.lang.Thread.run中的apache.hive.spark.client.SparkClientImpl $ 3.run(SparkClientImpl.java:450)〜[hive-exec-2.0.0.jar:2.0.0](Thread.java:745) 〜[?:1.8.0_77] 16/05/16 18:00:33 [Driver]:WARN client.SparkClientImpl:子进程退出代码1
我已经构建了Spark 1.6.1和hive 2.0.0,因此错误已更改为:
Exception in thread "main" java.lang.NoClassDefFoundError: scala/collection/Iterable
at org.apache.hadoop.hive.ql.parse.spark.GenSparkProcContext.<init>(GenSparkProcContext.java:163)
at org.apache.hadoop.hive.ql.parse.spark.SparkCompiler.generateTaskTree(SparkCompiler.java:195)
at org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:258)
at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10861)
at org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:239)
at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:250)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:437)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:329)
at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1158)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1253)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1084)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1072)
at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:232)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:183)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:399)
at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:776)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:714)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:641)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: java.lang.ClassNotFoundException: scala.collection.Iterable
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
答案 0 :(得分:0)
我在Hive 2.0.0和Spark 1.6.1上遇到了与你相同的问题。如前所述,它已在issues.apache.org/jira/browse/HIVE-9970进行了讨论。
对于Hive来说:
export MAVEN_OPTS="-Xmx2g -XX:MaxPermSize=512M -XX:ReservedCodeCacheSize=512m"
mvn clean package -Pdist -DskipTests
packaging/target/apache-hive-2.x.y-bin
。配置hive-site.xml。对于Spark:
./make-distribution.sh --name "hadoop2-without-hive" --tgz "-Pyarn,hadoop-provided,hadoop-2.6,parquet-provided"
dist/
处的结果。配置spark-defaults.conf。由于您在没有Hadoop的情况下构建了Spark,因此您需要将Hadoop包jar路径包含在$ SPARK_DIST_CLASSPATH中。见this documentation page。此外,您可以阅读Hive on Spark guide作为参考。