使用LATERAL VIEW json_tuple在Hive / tez中内存不足

时间:2018-01-26 14:04:42

标签: hive out-of-memory

[在OOM in tez/hive有一个初步问题,但在得到一些答案和评论之后,有必要提出一个新知识的新问题。]

我有一个大型的LATERAL VIEW查询。它加入了4个表,所有ORC都被压缩了。铲斗位于同一列上。它就像:

(RUNTIME*Hours/Month Saved)

如果我删除了LATERAL VIEW,则查询完成。 如果我添加LV,我总是最终得到:

select 
    10 fields from t
  , 80 fields from the lateral view
from
(
  select
    10 fields 
  from
              e (800M rows, 7GB of data, 1 bucket)
    LEFT JOIN m (1M rows, 20MB )
    LEFT JOIN c (2k rows, <1MB)
    LEFT JOIN contact (150M rows, 283GB, 4 buckets)
) t
LATERAL VIEW
    json_tuple (80 fields) as lv

我尝试了很多东西:

  • 更新所有tez.grouping。*设置。
  • 在JOIN中添加WHERE条件
  • ERROR : Status: Failed ERROR : Vertex failed, vertexName=Map 1, vertexId=vertex_1516602562532_3606_2_03, diagnostics=[Task failed, taskId=task_1516602562532_3606_2_03_000001, diagnostics=[TaskAttempt 0 failed, info=[Container container_e113_1516602562532_3606_01_000008 finished with diagnostics set to [Container failed, exitCode=255. Exception from container-launch. Container id: container_e113_1516602562532_3606_01_000008 Exit code: 255 Stack trace: ExitCodeException exitCode=255: at org.apache.hadoop.util.Shell.runCommand(Shell.java:933) at org.apache.hadoop.util.Shell.run(Shell.java:844) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1123) at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:237) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:317) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:83) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Container exited with a non-zero exit code 255 ]], TaskAttempt 1 failed, info=[Error: Failure while running task:java.lang.RuntimeException: java.lang.OutOfMemoryError: Java heap space at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:173) 以确保不会尝试进行地图加入
  • 添加set hive.auto.convert.join.noconditionaltask = false;个不同的列以防止可能的偏斜
  • 设置mapred.map.tasks = 100

我已经完成了所有 java-opts 内存设置。

我需要保留LATERAL VIEW,因为可能会使用某些字段对它们进行过滤(即,我只能做一些很好的字符串操作来输出类似csv的表)。

有没有办法让Lateral视图适合内存,或者将它拆分成多个映射器?这是tez UI视图:

enter image description here

hdp2.6,8个带有32GB Ram的数据节点

0 个答案:

没有答案