我是Hadoop和蜂巢的新手。我已经在四台服务器上成功配置了它们(也许...)。我可以从hadoop运行wordcount示例。然后,我在配置单元中创建一个表,其中仅要测试5条记录。但是,当我从表中提交select count(*)时,大约需要2-3分钟(可能更长)才能提交Mr职位:
(这是来自控制台的消息)。
Query ID = root_20181215204727_4ffb47f7-022f-4661-8b46-fa1eb1d00d65 .
Total jobs = 1 .
Launching Job 1 out of 1 .
Number of reduce tasks determined at compile time: 1 .
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number> .
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number> .
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number> .
那是我在前两分钟得到的。
然后,我可以看到正在从8088 Web ui提交作业。
那是来自控制台的消息:
Starting Job = job_1544796522138_0016, Tracking URL =
http://aa:8088/proxy/application_1544796522138_0016/
Kill Command = /usr/local/hadoop-2.9.2/bin/hadoop job -kill
job_1544796522138_0016
但是大约需要15分钟才能完成。它卡在地图%0中,长时间降低了0%。
2018-12-15 20:56:37,106 Stage-1 map = 0%, reduce = 0%
2018-12-15 20:57:37,560 Stage-1 map = 0%, reduce = 0%
我查看了日志并检查了hdfs的tmp目录,发现hive生成的job.jar文件为34MB,这需要大量时间在节点之间共享(我认为...)。那么为什么jar文件这么大???我是否包含不必要的内容?还是在配置时会犯一些错误?我认为我做了一个非常简单的配置。
现在选择完成...
hive> use test;
OK
Time taken: 0.157 seconds
hive> select count(*) from mrtest;
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in
the future versions. Consider using a different execution engine (i.e.
spark, tez) or using Hive 1.X releases.
Query ID = root_20181215204727_4ffb47f7-022f-4661-8b46-fa1eb1d00d65
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Starting Job = job_1544796522138_0016, Tracking URL =
http://aa:8088/proxy/application_1544796522138_0016/
Kill Command = /usr/local/hadoop-2.9.2/bin/hadoop job -kill
job_1544796522138_0016
Hadoop job information for Stage-1: number of mappers: 1; number of
reducers: 1
2018-12-15 20:56:37,106 Stage-1 map = 0%, reduce = 0%
2018-12-15 20:57:37,560 Stage-1 map = 0%, reduce = 0%
2018-12-15 20:58:37,708 Stage-1 map = 0%, reduce = 0%
2018-12-15 20:59:37,793 Stage-1 map = 0%, reduce = 0%
2018-12-15 21:00:37,818 Stage-1 map = 0%, reduce = 0%
2018-12-15 21:01:10,431 Stage-1 map = 100%, reduce = 0%, Cumulative
CPU 1.71 sec
2018-12-15 21:01:19,614 Stage-1 map = 100%, reduce = 100%, Cumulative
CPU 3.74 sec
MapReduce Total cumulative CPU time: 3 seconds 740 msec
Ended Job = job_1544796522138_0016
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 3.74 sec HDFS
Read:
7375 HDFS Write: 101 SUCCESS
Total MapReduce CPU Time Spent: 3 seconds 740 msec
OK
5
Time taken: 842.183 seconds, Fetched: 1 row(s)