我在单个节点集群上使用配置单元执行sql查询并且我收到此错误:
MapReduce Jobs Launched:
Stage-Stage-20: HDFS Read: 4456448 HDFS Write: 0 FAIL
Total MapReduce CPU Time Spent: 0 msec
在日志http://localhost:50070/logs/hadoop-hadoop-namenode-hadoop.log
中,可用空间似乎低于配置的预留金额:
org.apache.hadoop.hdfs.server.namenode.NameNodeResourceChecker:
Space available on volume '/dev/mapper/vg_hadoop-lv_root' is 40734720,
which is below the configured reserved amount 104857600`
你明白为什么会出现这个错误吗?
同样在磁盘分析器中,在执行查询之前我有12,6GB的可用空间,当执行因错误而停止时,磁盘分析器显示只有2GB的可用空间可用。我还用更多的30GB更新虚拟盒机器,同样的事情发生了。
完整错误:
Warning: Map Join MAPJOIN[110][bigTable=?] in task 'Stage-20:MAPRED' is a cross product
Warning: Shuffle Join JOIN[8][tables = [part, supplier]] in Stage 'Stage-1:MAPRED' is a cross product
Query ID = hadoopadmin_20160324175146_7ab8931d-eeac-4e03-b833-3592ed96521f
Total jobs = 9
Stage-27 is selected by condition resolver.
Stage-1 is filtered out by condition resolver.
16/03/24 17:51:59 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Execution log at: /tmp/hadoopadmin/hadoopadmin_20160324175146_7ab8931d-eeac-4e03-b833-3592ed96521f.log
2016-03-24 17:52:01 Starting to launch local task to process map join; maximum memory = 518979584
2016-03-24 17:52:05 Dump the side-table for tag: 1 with group count: 1 into file: file:/tmp/hadoopadmin/614990eb-e755-4bca-bccf-be19bd5c6882/hive_2016-03-24_17-51-46_111_5082675810708688029-1/-local-10017/HashTable-Stage-20/MapJoin-mapfile61--.hashtable
2016-03-24 17:52:06 Uploaded 1 File to: file:/tmp/hadoopadmin/614990eb-e755-4bca-bccf-be19bd5c6882/hive_2016-03-24_17-51-46_111_5082675810708688029-1/-local-10017/HashTable-Stage-20/MapJoin-mapfile61--.hashtable (938915 bytes)
2016-03-24 17:52:06 End of local task; Time Taken: 4.412 sec.
Execution completed successfully
MapredLocal task succeeded
Launching Job 2 out of 9
Number of reduce tasks is set to 0 since there's no reduce operator
Job running in-process (local Hadoop)
2016-03-24 17:52:10,043 Stage-20 map = 0%, reduce = 0%
2016-03-24 17:53:10,214 Stage-20 map = 0%, reduce = 0%
2016-03-24 17:54:10,272 Stage-20 map = 0%, reduce = 0%
2016-03-24 17:55:10,336 Stage-20 map = 0%, reduce = 0%
2016-03-24 17:56:10,386 Stage-20 map = 0%, reduce = 0%
2016-03-24 17:57:10,435 Stage-20 map = 0%, reduce = 0%
log4j:ERROR Failed to flush writer,
java.io.IOException: No space left on device
at java.io.FileOutputStream.writeBytes(Native Method)
at java.io.FileOutputStream.write(FileOutputStream.java:326)
at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:221)
at sun.nio.cs.StreamEncoder.implFlushBuffer(StreamEncoder.java:291)
at sun.nio.cs.StreamEncoder.implFlush(StreamEncoder.java:295)
at sun.nio.cs.StreamEncoder.flush(StreamEncoder.java:141)
at java.io.OutputStreamWriter.flush(OutputStreamWriter.java:229)
at org.apache.log4j.helpers.QuietWriter.flush(QuietWriter.java:59)
at org.apache.log4j.WriterAppender.subAppend(WriterAppender.java:324)
at org.apache.log4j.DailyRollingFileAppender.subAppend(DailyRollingFileAppender.java:369)
at org.apache.log4j.WriterAppender.append(WriterAppender.java:162)
at org.apache.log4j.AppenderSkeleton.doAppend(AppenderSkeleton.java:251)
at org.apache.log4j.helpers.AppenderAttachableImpl.appendLoopOnAppenders(AppenderAttachableImpl.java:66)
at org.apache.log4j.Category.callAppenders(Category.java:206)
at org.apache.log4j.Category.forcedLog(Category.java:391)
at org.apache.log4j.Category.log(Category.java:856)
at org.apache.commons.logging.impl.Log4JLogger.fatal(Log4JLogger.java:239)
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:171)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Ended Job = job_local60483225_0001 with errors
Error during job, obtaining debugging information...
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
MapReduce Jobs Launched:
Stage-Stage-20: HDFS Read: 4472832 HDFS Write: 0 FAIL
Total MapR
educe CPU Time Spent: 0 msec
hive>
查询:
select
nation,
o_year,
sum(amount) as sum_profit
from
(select
n_name as nation,
year(o_orderdate) as o_year,
l_extendedprice * (1 - l_discount) - ps_supplycost * l_quantity as amount
from part,
supplier,
lineitem,
partsupp,
orders,
nation
where
s_suppkey = l_suppkey and
ps_suppkey = l_suppkey and
ps_partkey = l_partkey and
p_partkey = l_partkey and
o_orderkey = l_orderkey and
s_nationkey = n_nationkey and
p_name like '%plum%' ) as profit
group by nation, o_year
order by nation, o_year desc;
答案 0 :(得分:1)
这可能是你的问题:
Warning: Map Join MAPJOIN[110][bigTable=?] in task 'Stage-20:MAPRED' is a cross product
Warning: Shuffle Join JOIN[8][tables = [part, supplier]] in Stage 'Stage-1:MAPRED' is a cross product
如果有很多键,交叉产品往往会将几GB的表格转换成TB级的表格......重新评估您的查询并确保它按照您的想法进行操作。
修改现在您已添加了查询,我可以添加更多内容。这部分:
from part,
supplier,
lineitem,
partsupp,
orders,
nation
是您可以优化的地方。这是创建笛卡尔产品,这是你的问题。您正在发生的事情是,首先加入跨产品中的所有表,然后根据您的where
子句保留记录,而不是使用on
选择性地将表连接在一起条款。尝试这个(公认的丑陋)优化版本的查询:
select
nation,
o_year,
sum(amount) as sum_profit
from
(select
n_name as nation,
year(o_orderdate) as o_year,
l_extendedprice * (1 - l_discount) - ps_supplycost * l_quantity as amount
from
orders o join
(select
l_extendedprice,
l_discount,
l_quantity,
l_orderkey,
n_name,
ps_supplycost
from part p join
(select
l_extendedprice,
l_discount,
l_quantity,
l_partkey,
l_orderkey,
n_name,
ps_supplycost
from partsupp ps join
(select
l_suppkey,
l_extendedprice,
l_discount,
l_quantity,
l_partkey,
l_orderkey,
n_name
from
(select s_suppkey, n_name
from nation n join supplier s on n.n_nationkey = s.s_nationkey
) s1 join lineitem l on s1.s_suppkey = l.l_suppkey
) l1 on ps.ps_suppkey = l1.l_suppkey and ps.ps_partkey = l1.l_partkey
) l2 on p.p_name like '%plum%' and p.p_partkey = l2.l_partkey
) l3 on o.o_orderkey = l3.l_orderkey
)profit
group by nation, o_year
order by nation, o_year desc;