我在我的VM上安装了Spark的CDH5.6
在Hive中,当我使用mr
引擎提交任何查询时,它运行良好,但当我将其更改为spark
时,日志显示它正在提交作业并进入无限循环,如下所示:
hive> set hive.execution.engine=spark;
hive> create table landing.tmp as select * from landing.employee;
Query ID = root_20171203165454_64ca3a41-25af-4cd5-a2c1-9b7c9e8f49cd
Total jobs = 1
Launching Job 1 out of 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Starting Spark Job = cf7a1eff-7ba6-48f5-90dd-f5de3794c36e
Query Hive on Spark job[0] stages:
0
Status: Running (Hive on Spark job[0])
Job Progress Format
CurrentTime StageId_StageAttemptId: SucceededTasksCount(+RunningTasksCount-FailedTasksCount)/TotalTasksCount [StageCost]
2017-12-03 16:55:27,468 Stage-0_0: 0/1
2017-12-03 16:55:30,528 Stage-0_0: 0/1
2017-12-03 16:55:33,585 Stage-0_0: 0/1
2017-12-03 16:55:37,557 Stage-0_0: 0/1
2017-12-03 16:55:40,617 Stage-0_0: 0/1
2017-12-03 16:55:43,683 Stage-0_0: 0/1
2017-12-03 16:55:46,764 Stage-0_0: 0/1
2017-12-03 16:55:49,822 Stage-0_0: 0/1
2017-12-03 16:55:52,900 Stage-0_0: 0/1
2017-12-03 16:55:55,945 Stage-0_0: 0/1
2017-12-03 16:55:58,999 Stage-0_0: 0/1
2017-12-03 16:56:02,077 Stage-0_0: 0/1
2017-12-03 16:56:05,134 Stage-0_0: 0/1
2017-12-03 16:56:08,196 Stage-0_0: 0/1
2017-12-03 16:56:11,238 Stage-0_0: 0/1
2017-12-03 16:56:14,280 Stage-0_0: 0/1
2017-12-03 16:56:17,345 Stage-0_0: 0/1
2017-12-03 16:56:20,380 Stage-0_0: 0/1
2017-12-03 16:56:23,405 Stage-0_0: 0/1
2017-12-03 16:56:26,464 Stage-0_0: 0/1
2017-12-03 16:56:29,534 Stage-0_0: 0/1
2017-12-03 16:56:32,598 Stage-0_0: 0/1
2017-12-03 16:56:35,627 Stage-0_0: 0/1
2017-12-03 16:56:38,661 Stage-0_0: 0/1
2017-12-03 16:56:41,718 Stage-0_0: 0/1
....
...
我在/usr/lib/hive/lib
路径中添加了spark-assembly-xx * .... jar
我还在hive-site.xml
中添加了以下属性。
<property>
<name>spark.master</name>
<value>spark://192.168.190.128:7077</value>
</property>
<property>
<name>spark.home</name>
<value>/usr/lib/spark</value>
</property>
<property>
<name>spark.eventLog.enabled</name>
<value>true</value>
</property>
<property>
<name>spark.eventLog.dir</name>
<value>/usr/lib/hive/spark_log</value>
</property>
<property>
<name>spark.executor.memory</name>
<value>512m</value>
</property>
<property>
<name>spark.serializer</name>
<value>org.apache.spark.serializer.KryoSerializer</value>
</property>
<property>
<name>spark.executor.cores</name>
<value>3</value>
</property>
<property>
<name>spark.driver.memory</name>
<value>1024m</value>
</property>
我还在其值<name>spark.eventLog.dir</name>
中创建了属性名称<value>/usr/lib/hive/spark_log</value>
的目录
我甚至尝试将spark.master
设为spark://localhost.localdomain:7077
我错过了什么吗?