德鲁伊索引任务因OutOfMemory异常而失败

时间:2018-03-27 21:53:15

标签: druid

我创建了一个德鲁伊群集并提交了一个索引任务。看起来有一个减速器歪斜发生,索引任务卡住减少了99%。它失败并出现以下错误。

2018-03-27T21:14:30,349 INFO [task-runner-0-priority-0] org.apache.hadoop.mapreduce.Job -  map 100% reduce 96%
2018-03-27T21:14:33,353 INFO [task-runner-0-priority-0] org.apache.hadoop.mapreduce.Job -  map 100% reduce 97%
2018-03-27T21:15:18,418 INFO [task-runner-0-priority-0] org.apache.hadoop.mapreduce.Job -  map 100% reduce 98%
2018-03-27T21:26:05,358 INFO [task-runner-0-priority-0] org.apache.hadoop.mapreduce.Job -  map 100% reduce 99%
2018-03-27T21:37:04,261 INFO [task-runner-0-priority-0] org.apache.hadoop.mapreduce.Job -  map 100% reduce 100%
2018-03-27T21:42:34,690 INFO [task-runner-0-priority-0] org.apache.hadoop.mapreduce.Job - Task Id : attempt_1522166154803_0010_r_000001_3, Status : FAILED
Container [pid=111411,containerID=container_1522166154803_0010_01_000388] is running beyond physical memory limits. Current usage: 7.9 GB of 7.4 GB physical memory used; 10.8 GB of 36.9 GB virtual memory used. Killing container.
Dump of the process-tree for container_1522166154803_0010_01_000388 :
    |- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE
    |- 111411 111408 111411 111411 (bash) 1 2 115810304 696 /bin/bash -c /usr/lib/jvm/java-openjdk/bin/java -Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN  -Xmx6042m -Ddruid.storage.bucket=dish-Djava.io.tmpdir=/mnt/yarn/usercache/hadoop/appcache/application_1522166154803_0010/container_1522166154803_0010_01_000388/tmp -Dlog4j.configuration=container-log4j.properties -Dyarn.app.container.log.dir=/var/log/hadoop-yarn/containers/application_1522166154803_0010/container_1522166154803_0010_01_000388 -Dyarn.app.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA -Dhadoop.root.logfile=syslog -Dyarn.app.mapreduce.shuffle.logger=INFO,shuffleCLA -Dyarn.app.mapreduce.shuffle.logfile=syslog.shuffle -Dyarn.app.mapreduce.shuffle.log.filesize=0 -Dyarn.app.mapreduce.shuffle.log.backups=0 org.apache.hadoop.mapred.YarnChild 10.176.225.139 35084 attempt_1522166154803_0010_r_000001_3 388 1>/var/log/hadoop-yarn/containers/application_1522166154803_0010/container_1522166154803_0010_01_000388/stdout 2>/var/log/hadoop-yarn/containers/application_1522166154803_0010/container_1522166154803_0010_01_000388/stderr  
    |- 111591 111411 111411 111411 (java) 323692 28249 11526840320 2058251 /usr/lib/jvm/java-openjdk/bin/java -Djava.net.preferIPv4Stack=true Djava.io.tmpdir=/mnt/yarn/usercache/hadoop/appcache/application_1522166154803_0010/container_1522166154803_0010_01_000388/tmp -Dlog4j.configuration=container-log4j.properties -Dyarn.app.container.log.dir=/var/log/hadoop-yarn/containers/application_1522166154803_0010/container_1522166154803_0010_01_000388 -Dyarn.app.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA -Dhadoop.root.logfile=syslog -Dyarn.app.mapreduce.shuffle.logger=INFO,shuffleCLA -Dyarn.app.mapreduce.shuffle.logfile=syslog.shuffle -Dyarn.app.mapreduce.shuffle.log.filesize=0 -Dyarn.app.mapreduce.shuffle.log.backups=0 org.apache.hadoop.mapred.YarnChild 10.176.225.139 35084 attempt_1522166154803_0010_r_000001_3 388 

Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143

我检查了我的yarn-site.xml,下面是我的配置。

  <property>
    <name>yarn.scheduler.maximum-allocation-mb</name>
    <value>241664</value>
  </property>

以下是我的索引配置。我试图加载的数据仅适用于2018-04-04。

{
  "type" : "index_hadoop",
  "spec" : {
    "dataSchema" : {
      "dataSource" : "viewership",
      "parser" : {
        "type" : "hadoopyString",
        "parseSpec" : {
          "format" : "json",
          "timestampSpec" : {
            "column" : "event_date",
            "format" : "auto"
          },
          "dimensionsSpec" : {
            "dimensions": ["network_group","show_name","time_of_day","viewing_type","core_latino","dma_name","legacy_unit","presence_of_kids","head_of_hhold_age","prin","sys","tenure_years","vip_w_dvr","vip_wo_dvr","network_rank","needs_based_segment","hopper","core_english","star_status","day_of_week"],
            "dimensionExclusions" : [],
            "spatialDimensions" : []
          }
        }
      },
      "metricsSpec" : [
        {
          "type" : "count",
          "name" : "count"
        },
        {
          "type" : "longSum",
          "name" : "time_watched",
          "fieldName" : "time_watched"
        },
        {
          "type" : "cardinality",
          "name" : "distinct_accounts",
          "fields" :  [ "account_id" ]
        }
      ],
      "granularitySpec" : {
        "type" : "uniform",
        "segmentGranularity" : "DAY",
        "queryGranularity" : "NONE",
        "intervals" : [ "2017-04-03/2017-04-16" ]
      }
    },
    "ioConfig" : {
      "type" : "hadoop",
      "inputSpec" : {
        "type" : "static",
        "paths" : "/user/hadoop/"
      }
    },
    "tuningConfig": {
      "type": "hadoop",
      "partitionsSpec": {
        "type": "hashed",
        "targetPartitionSize": 4000000,
        "assumeGrouped": true
      },
      "useCombiner": true,
      "buildV9Directly": true,
      "numBackgroundPersistThreads": 1
    }
  },
  "hadoopDependencyCoordinates": ["org.apache.hadoop:hadoop-client:2.7.3", "org.apache.hadoop:hadoop-aws:2.7.3", "com.hadoop.gplcompression:hadoop-lzo:0.4.19"]
}

2 个答案:

答案 0 :(得分:1)

我早年与Druid MR Job一起面对同样的问题。

(yarn.scheduler.maximum-allocation-mb:241664)中设置的属性表示可以分配的最大容器大小。但这里的问题是分配的map / reducer容器大小。检查mapreduce.map.memory.mb / mapreduce.reduce.memory.mb中的默认属性。您还应调整拆分大小以控制每个容器正在处理的块大小。

我使用了以下&#34; jobProperties&#34;德鲁伊指数Job Json:

"jobProperties":{
        "mapreduce.map.memory.mb" : "8192",
        "mapreduce.reduce.memory.mb" : "18288",
        "mapreduce.input.fileinputformat.split.minsize" : "125829120",
        "mapreduce.input.fileinputformat.split.maxsize" : "268435456"
}

答案 1 :(得分:0)

您需要增加内存或为其提供虚拟内存。或者更好的方法是 -

您可以使用较小的细分粒度(例如,日级

)生成多个摄取任务
"intervals" : [ "2017-04-03/2017-04-04" ]

等等。