Hive on Tez中的ORDER BY语句引发OOM异常

时间:2018-12-17 14:36:40

标签: hadoop hive out-of-memory apache-tez

我正在尝试使用ORDER BY查找在Hive的表中进行输入的最早时间。语句看起来像这样

SELECT latitude, longitude, timeiss
FROM iss
ORDER BY timeiss
LIMIT 10;

这给我一个错误消息,看起来像这样:

https://i.imgur.com/cgIiSKh.png

仅向您展示SELECT语句如何在没有ORDER BY的情况下工作:

https://i.imgur.com/k6RwAd4.png

latitude    longitude   timeiss
-26.6542    -96.9894    2018-11-28 10:13:42
-39.6293    -80.6984    2018-11-28 10:18:45

尝试时出现几乎相同的错误

SELECT MIN(timeiss)
FROM iss

timeiss是一个字符串。

完整的错误消息为文本

  

java.sql.SQLException:处理语句时出错:FAILED:   执行错误,返回代码2   org.apache.hadoop.hive.ql.exec.tez.TezTask。顶点失败,   vertexName =地图1,vertexId = vertex_1541164145004_0025_1_00,   diagnostics = [任务失败,taskId = task_1541164145004_0025_1_00_000000,   diagnostics = [TaskAttempt 0失败,信息= [错误:运行时失败   任务:java.lang.RuntimeException:java.lang.OutOfMemoryError:Java堆   处的空间   org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:159)   在   org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:139)   在   org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:347)   在   org.apache.tez.runtime.task.TezTaskRunner $ TaskRunnerCallable $ 1.run(TezTaskRunner.java:194)   在   org.apache.tez.runtime.task.TezTaskRunner $ TaskRunnerCallable $ 1.run(TezTaskRunner.java:185)   在java.security.AccessController.doPrivileged(本机方法)在   javax.security.auth.Subject.doAs(Subject.java:422)在   org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724)   在   org.apache.tez.runtime.task.TezTaskRunner $ TaskRunnerCallable.callInternal(TezTaskRunner.java:185)   在   org.apache.tez.runtime.task.TezTaskRunner $ TaskRunnerCallable.callInternal(TezTaskRunner.java:181)   在org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)   在java.util.concurrent.FutureTask.run(FutureTask.java:266)在   java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)   在   java.util.concurrent.ThreadPoolExecutor $ Worker.run(ThreadPoolExecutor.java:617)   在java.lang.Thread.run(Thread.java:745)造成原因:   java.lang.OutOfMemoryError:Java堆空间位于   java.nio.HeapByteBuffer。(HeapByteBuffer.java:57)在   java.nio.ByteBuffer.allocate(ByteBuffer.java:335)在   org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter。(PipelinedSorter.java:173)   在   org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter。(PipelinedSorter.java:117)   在   org.apache.tez.runtime.library.output.OrderedPartitionedKVOutput.start(OrderedPartitionedKVOutput.java:142)   在   org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:138)   在   org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:149)   ... 14更多],TaskAttempt 1失败,信息= [错误:   运行任务:java.lang.RuntimeException:java.lang.OutOfMemoryError:   Java堆空间位于   org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:159)   在   org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:139)   在   org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:347)   在   org.apache.tez.runtime.task.TezTaskRunner $ TaskRunnerCallable $ 1.run(TezTaskRunner.java:194)   在   org.apache.tez.runtime.task.TezTaskRunner $ TaskRunnerCallable $ 1.run(TezTaskRunner.java:185)   在java.security.AccessController.doPrivileged(本机方法)在   javax.security.auth.Subject.doAs(Subject.java:422)在   org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724)   在   org.apache.tez.runtime.task.TezTaskRunner $ TaskRunnerCallable.callInternal(TezTaskRunner.java:185)   在   org.apache.tez.runtime.task.TezTaskRunner $ TaskRunnerCallable.callInternal(TezTaskRunner.java:181)   在org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)   在java.util.concurrent.FutureTask.run(FutureTask.java:266)在   java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)   在   java.util.concurrent.ThreadPoolExecutor $ Worker.run(ThreadPoolExecutor.java:617)   在java.lang.Thread.run(Thread.java:745)造成原因:   java.lang.OutOfMemoryError:Java堆空间位于   java.nio.HeapByteBuffer。(HeapByteBuffer.java:57)在   java.nio.ByteBuffer.allocate(ByteBuffer.java:335)在   org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter。(PipelinedSorter.java:173)   在   org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter。(PipelinedSorter.java:117)   在   org.apache.tez.runtime.library.output.OrderedPartitionedKVOutput.start(OrderedPartitionedKVOutput.java:142)   在   org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:138)   在   org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:149)   ... 14更多],TaskAttempt 2失败,信息= [错误:   运行任务:java.lang.RuntimeException:java.lang.OutOfMemoryError:   Java堆空间位于   org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:159)   在   org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:139)   在   org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:347)   在   org.apache.tez.runtime.task.TezTaskRunner $ TaskRunnerCallable $ 1.run(TezTaskRunner.java:194)   在   org.apache.tez.runtime.task.TezTaskRunner $ TaskRunnerCallable $ 1.run(TezTaskRunner.java:185)   在java.security.AccessController.doPrivileged(本机方法)在   javax.security.auth.Subject.doAs(Subject.java:422)在   org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724)   在   org.apache.tez.runtime.task.TezTaskRunner $ TaskRunnerCallable.callInternal(TezTaskRunner.java:185)   在   org.apache.tez.runtime.task.TezTaskRunner $ TaskRunnerCallable.callInternal(TezTaskRunner.java:181)   在org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)   在java.util.concurrent.FutureTask.run(FutureTask.java:266)在   java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)   在   java.util.concurrent.ThreadPoolExecutor $ Worker.run(ThreadPoolExecutor.java:617)   在java.lang.Thread.run(Thread.java:745)造成原因:   java.lang.OutOfMemoryError:Java堆空间位于   java.nio.HeapByteBuffer。(HeapByteBuffer.java:57)在   java.nio.ByteBuffer.allocate(ByteBuffer.java:335)在   org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter。(PipelinedSorter.java:173)   在   org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter。(PipelinedSorter.java:117)   在   org.apache.tez.runtime.library.output.OrderedPartitionedKVOutput.start(OrderedPartitionedKVOutput.java:142)   在   org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:138)   在   org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:149)   ... 14更多],TaskAttempt 3失败,信息= [错误:   运行任务:java.lang.RuntimeException:java.lang.OutOfMemoryError:   Java堆空间位于   org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:159)   在   org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:139)   在   org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:347)   在   org.apache.tez.runtime.task.TezTaskRunner $ TaskRunnerCallable $ 1.run(TezTaskRunner.java:194)   在   org.apache.tez.runtime.task.TezTaskRunner $ TaskRunnerCallable $ 1.run(TezTaskRunner.java:185)   在java.security.AccessController.doPrivileged(本机方法)在   javax.security.auth.Subject.doAs(Subject.java:422)在   org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724)   在   org.apache.tez.runtime.task.TezTaskRunner $ TaskRunnerCallable.callInternal(TezTaskRunner.java:185)   在   org.apache.tez.runtime.task.TezTaskRunner $ TaskRunnerCallable.callInternal(TezTaskRunner.java:181)   在org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)   在java.util.concurrent.FutureTask.run(FutureTask.java:266)在   java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)   在   java.util.concurrent.ThreadPoolExecutor $ Worker.run(ThreadPoolExecutor.java:617)   在java.lang.Thread.run(Thread.java:745)造成原因:   java.lang.OutOfMemoryError:Java堆空间位于   java.nio.HeapByteBuffer。(HeapByteBuffer.java:57)在   java.nio.ByteBuffer.allocate(ByteBuffer.java:335)在   org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter。(PipelinedSorter.java:173)   在   org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter。(PipelinedSorter.java:117)   在   org.apache.tez.runtime.library.output.OrderedPartitionedKVOutput.start(OrderedPartitionedKVOutput.java:142)   在   org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:138)   在   org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:149)   ... 14 more]],由于OWN_TASK_FAILURE,顶点未成功,   failedTasks:1 KilledTasks:0,Vertex vertex_1541164145004_0025_1_00   [Map 1]被杀/失败,原因是:OWN_TASK_FAILURE]顶点被杀,   vertexName = Reducer 2,vertexId = vertex_1541164145004_0025_1_01,   diagnostics = [顶点处于RUNNING状态时被杀死。   由于OTHER_VERTEX_FAILURE而失败,Tasks:0被杀死,Tasks:1,失败,   顶点vertex_1541164145004_0025_1_01 [Reducer 2]已终止/失败   to:OTHER_VERTEX_FAILURE]由于VERTEX_FAILURE,DAG无法成功。   failedVertices:1 KilledVertices:1

1 个答案:

答案 0 :(得分:0)

Map1顶点因OOM异常而失败:

  

java.lang.RuntimeException:java.lang.OutOfMemoryError:Java堆   处的空间   org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor

尝试增加Mapper容器和JVM内存。

赞:

set hive.tez.container.size=9216;
set hive.tez.java.opts=-Xmx6144m;

但是最好检查一下您当前的容器和Java堆大小并相应地增加它。 阅读本文以了解更多详细信息:Demystifying Tez Memory Tuning