我正在使用8个VCPU和15G RAM的10个Worker节点运行EMR。输入文件大小约为7G。
这是map / reducer配置:
SET job.name 'correlations';
SET pig.exec.reducers.bytes.per.reducer 2147483648;
SET pig.exec.reducers.max 60;
SET pig.splitCombination true;
SET mapred.min.split.size 268435456;
流程耗时差不多10个小时,之后失败了。我正在寻求帮助来优化和修复这个异常,我是相对较新的EMR。这适用于具有相同配置的Rackspace环境中的HDP(过去需要3-4天才能完成)
以下是完整的猪脚本:
-- run from gateway node
-- SET DEFAULT_PARALLEL 100;
SET job.name 'correlations';
SET pig.exec.reducers.bytes.per.reducer 2147483648;
SET pig.exec.reducers.max 60;
SET pig.splitCombination true;
SET mapred.min.split.size 268435456;
-- load votes from HDFS and do self join
votes1 = LOAD '<input file path>/filtered_votes.txt' USING PigStorage(',') AS (uid1: long, lnid1: long, t1: int);
votes2 = LOAD '<input file path>/list_item_correlation/filtered_votes.txt' USING PigStorage(',') AS (uid2: long, lnid2: long, t2: int);
pairs = JOIN votes1 BY uid1, votes2 BY uid2;
-- eliminate self and symmetric correlations
required_pairs = FILTER pairs BY (lnid1 < lnid2);
flags = FOREACH required_pairs GENERATE lnid1, lnid2,
( (t1 == 1 AND t2 == 1) ? 1 : 0 ) AS uu,
( (t1 == 1 AND t2 == 0) ? 1 : 0 ) AS ud,
( (t1 == 0 AND t2 == 1) ? 1 : 0 ) AS du,
( (t1 == 0 AND t2 == 0) ? 1 : 0 ) AS dd;
grouped_flags = GROUP flags BY (lnid1, lnid2);
counted = FOREACH grouped_flags GENERATE group AS ids,
SUM(flags.uu) AS suu,
SUM(flags.ud) AS sud,
SUM(flags.du) AS sdu,
SUM(flags.dd) AS sdd;
-- restrict to items with at least 30 common voters
-- (use 0 when testing)
fltrd = FILTER counted BY (suu + sud + sdu + sdd >= 30);
-- avoid divide by 0 errors when computing odds ratios below
corr1 = FOREACH fltrd GENERATE ids.lnid1 AS lnid1, ids.lnid2 AS lnid2,
MAX(TOBAG(suu, 1L)) AS uu,
MAX(TOBAG(sud, 1L)) AS ud,
MAX(TOBAG(sdu, 1L)) AS du,
MAX(TOBAG(sdd, 1L)) AS dd;
-- symmetric pair
corr2 = FOREACH fltrd GENERATE ids.lnid2 AS lnid1, ids.lnid1 AS lnid2,
MAX(TOBAG(suu, 1L)) AS uu,
MAX(TOBAG(sdu, 1L)) AS ud,
MAX(TOBAG(sud, 1L)) AS du,
MAX(TOBAG(sdd, 1L)) AS dd;
-- union
correlations = UNION corr1, corr2;
-- generate vote counts
vote_flags = FOREACH votes1 GENERATE lnid1, (t1 == 1 ? 1 : 0) AS up, (t1 == 0 ? 1 : 0) AS dn;
grpd = GROUP vote_flags BY lnid1;
vote_counts = FOREACH grpd GENERATE group AS lnid, SUM(vote_flags.up) AS up, SUM(vote_flags.dn) AS dn;
-- JOIN vote counts and correlations
jnd = JOIN vote_counts BY lnid, correlations BY lnid2;
-- avoid divide by 0 errors
jnd2 = FOREACH jnd GENERATE lnid1, lnid2, uu, ud, du, dd, up, dn,
MAX(TOBAG(dn-ud, 1L)) AS dnud,
MAX(TOBAG(up-uu, 1L)) AS upuu,
MAX(TOBAG(dn-dd, 1L)) AS dndd,
MAX(TOBAG(up-du, 1L)) AS updu;
-- calculate all the odds ratios
odds = FOREACH jnd2 GENERATE lnid1, lnid2, uu, ud, du, dd,
(1.0 * uu * dd) / (ud * du) AS odds,
EXP ( LOG ( (1.0 * uu * dd) / (ud * du) ) + (1.96 * SQRT ( (1.0 / uu) + (1.0 / ud) + (1.0 / du) + (1.0 / dd) )) ) AS high,
EXP ( LOG ( (1.0 * uu * dd) / (ud * du) ) - (1.96 * SQRT ( (1.0 / uu) + (1.0 / ud) + (1.0 / du) + (1.0 / dd) )) ) AS low,
(1.0 * uu * dnud) / (ud * upuu) AS odds_p,
EXP ( LOG ( (1.0 * uu * dnud) / (ud * upuu) ) + (1.96 * SQRT ( (1.0 / uu) + (1.0 / ud) + (1.0 / upuu) + (1.0 / dnud) )) ) AS high_p,
EXP ( LOG ( (1.0 * uu * dnud) / (ud * upuu) ) - (1.96 * SQRT ( (1.0 / uu) + (1.0 / ud) + (1.0 / upuu) + (1.0 / dnud) )) ) AS low_p,
(1.0 * du * dndd) / (dd * updu) AS odds_n,
EXP ( LOG ( (1.0 * du * dndd) / (dd * updu) ) + (1.96 * SQRT ( (1.0 / du) + (1.0 / dd) + (1.0 / updu) + (1.0 / dndd) )) ) AS high_n,
EXP ( LOG ( (1.0 * du * dndd) / (dd * updu) ) - (1.96 * SQRT ( (1.0 / du) + (1.0 / dd) + (1.0 / updu) + (1.0 / dndd) )) ) AS low_n;
STORE odds INTO '<output location>';
EMR系统日志错误:
............
............
2017-07-06 11:26:11,346 INFO org.apache.tez.common.counters.Limits (PigTezLauncher-0): Counter limits initialized with parameters: GROUP_NAME_MAX=256, MAX_GROUPS=500, COUNTER_NAME_MAX=64, MAX_COUNTERS=120
2017-07-06 11:26:11,351 INFO org.apache.pig.backend.hadoop.executionengine.tez.TezJob (PigTezLauncher-0): DAG Status: status=FAILED, progress=TotalTasks: 196 Succeeded: 61 Running: 0 Failed: 1 Killed: 134 FailedTaskAttempts: 73 KilledTaskAttempts: 143, diagnostics=Vertex re-running, vertexName=scope-615, vertexId=vertex_1499292782644_0001_1_00
Vertex re-running, vertexName=scope-622, vertexId=vertex_1499292782644_0001_1_02
Vertex re-running, vertexName=scope-615, vertexId=vertex_1499292782644_0001_1_00
Vertex re-running, vertexName=scope-622, vertexId=vertex_1499292782644_0001_1_02
Vertex re-running, vertexName=scope-619, vertexId=vertex_1499292782644_0001_1_01
Vertex re-running, vertexName=scope-622, vertexId=vertex_1499292782644_0001_1_02
Vertex re-running, vertexName=scope-615, vertexId=vertex_1499292782644_0001_1_00
Vertex re-running, vertexName=scope-622, vertexId=vertex_1499292782644_0001_1_02
Vertex re-running, vertexName=scope-615, vertexId=vertex_1499292782644_0001_1_00
Vertex failed, vertexName=scope-623, vertexId=vertex_1499292782644_0001_1_03, diagnostics=[Task failed, taskId=task_1499292782644_0001_1_03_000034, diagnostics=[TaskAttempt 0 killed, TaskAttempt 1 failed, info=[Container container_1499292782644_0001_01_000255 finished with diagnostics set to [Container failed, exitCode=-100. Container released on a *lost* node]], TaskAttempt 2 killed, TaskAttempt 3 failed, info=[Error: Error while running task ( failure ) : org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$ShuffleError: error in shuffle in Fetcher {scope_615} #7
at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.callInternal(Shuffle.java:301)
at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.callInternal(Shuffle.java:285)
at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.IOException: scope_615: Shuffle failed with too many fetch failures and insufficient progress!failureCounts=12, pendingInputs=12, fetcherHealthy=false, reducerProgressedEnough=true, reducerStalled=true
at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.ShuffleScheduler.isShuffleHealthy(ShuffleScheduler.java:977)
at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.ShuffleScheduler.copyFailed(ShuffleScheduler.java:718)
at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.FetcherOrderedGrouped.setupConnection(FetcherOrderedGrouped.java:376)
at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.FetcherOrderedGrouped.copyFromHost(FetcherOrderedGrouped.java:260)
at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.FetcherOrderedGrouped.fetchNext(FetcherOrderedGrouped.java:178)
at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.FetcherOrderedGrouped.callInternal(FetcherOrderedGrouped.java:191)
at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.FetcherOrderedGrouped.callInternal(FetcherOrderedGrouped.java:54)
... 5 more
, errorMessage=Shuffle Runner Failed:org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$ShuffleError: error in shuffle in Fetcher {scope_615} #7
at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.callInternal(Shuffle.java:301)
at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.callInternal(Shuffle.java:285)
at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.IOException: scope_615: Shuffle failed with too many fetch failures and insufficient progress!failureCounts=12, pendingInputs=12, fetcherHealthy=false, reducerProgressedEnough=true, reducerStalled=true
at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.ShuffleScheduler.isShuffleHealthy(ShuffleScheduler.java:977)
at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.ShuffleScheduler.copyFailed(ShuffleScheduler.java:718)
at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.FetcherOrderedGrouped.setupConnection(FetcherOrderedGrouped.java:376)
at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.FetcherOrderedGrouped.copyFromHost(FetcherOrderedGrouped.java:260)
at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.FetcherOrderedGrouped.fetchNext(FetcherOrderedGrouped.java:178)
at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.FetcherOrderedGrouped.callInternal(FetcherOrderedGrouped.java:191)
at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.FetcherOrderedGrouped.callInternal(FetcherOrderedGrouped.java:54)
... 5 more
], TaskAttempt 4 killed, TaskAttempt 5 failed, info=[Error: Error while running task ( failure ) : org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$ShuffleError: error in shuffle in Fetcher {scope_615} #2
at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.callInternal(Shuffle.java:301)
at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.callInternal(Shuffle.java:285)
at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.IOException: Failed 15 times trying to download from scope-615_000020_00. threshold=15
at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.ShuffleScheduler.isAbortLimitExceeedFor(ShuffleScheduler.java:740)
at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.ShuffleScheduler.isShuffleHealthy(ShuffleScheduler.java:930)
at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.ShuffleScheduler.copyFailed(ShuffleScheduler.java:718)
at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.FetcherOrderedGrouped.setupLocalDiskFetch(FetcherOrderedGrouped.java:696)
at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.FetcherOrderedGrouped.fetchNext(FetcherOrderedGrouped.java:175)
at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.FetcherOrderedGrouped.callInternal(FetcherOrderedGrouped.java:191)
at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.FetcherOrderedGrouped.callInternal(FetcherOrderedGrouped.java:54)
... 5 more
, errorMessage=Shuffle Runner Failed:org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$ShuffleError: error in shuffle in Fetcher {scope_615} #2
at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.callInternal(Shuffle.java:301)
at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.callInternal(Shuffle.java:285)
at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.IOException: Failed 15 times trying to download from scope-615_000020_00. threshold=15
at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.ShuffleScheduler.isAbortLimitExceeedFor(ShuffleScheduler.java:740)
at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.ShuffleScheduler.isShuffleHealthy(ShuffleScheduler.java:930)
at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.ShuffleScheduler.copyFailed(ShuffleScheduler.java:718)
at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.FetcherOrderedGrouped.setupLocalDiskFetch(FetcherOrderedGrouped.java:696)
at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.FetcherOrderedGrouped.fetchNext(FetcherOrderedGrouped.java:175)
at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.FetcherOrderedGrouped.callInternal(FetcherOrderedGrouped.java:191)
at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.FetcherOrderedGrouped.callInternal(FetcherOrderedGrouped.java:54)
... 5 more
], TaskAttempt 6 failed, info=[Error: Error while running task ( failure ) : org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$ShuffleError: error in shuffle in Fetcher {scope_615} #3
at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.callInternal(Shuffle.java:301)
at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.callInternal(Shuffle.java:285)
at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.IOException: Failed 15 times trying to download from scope-615_000003_00. threshold=15
at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.ShuffleScheduler.isAbortLimitExceeedFor(ShuffleScheduler.java:740)
at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.ShuffleScheduler.isShuffleHealthy(ShuffleScheduler.java:930)
at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.ShuffleScheduler.copyFailed(ShuffleScheduler.java:718)
at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.FetcherOrderedGrouped.setupLocalDiskFetch(FetcherOrderedGrouped.java:696)
at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.FetcherOrderedGrouped.fetchNext(FetcherOrderedGrouped.java:175)
at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.FetcherOrderedGrouped.callInternal(FetcherOrderedGrouped.java:191)
at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.FetcherOrderedGrouped.callInternal(FetcherOrderedGrouped.java:54)
... 5 more
, errorMessage=Shuffle Runner Failed:org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$ShuffleError: error in shuffle in Fetcher {scope_615} #3
at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.callInternal(Shuffle.java:301)
at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.callInternal(Shuffle.java:285)
at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.IOException: Failed 15 times trying to download from scope-615_000003_00. threshold=15
at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.ShuffleScheduler.isAbortLimitExceeedFor(ShuffleScheduler.java:740)
at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.ShuffleScheduler.isShuffleHealthy(ShuffleScheduler.java:930)
at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.ShuffleScheduler.copyFailed(ShuffleScheduler.java:718)
at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.FetcherOrderedGrouped.setupLocalDiskFetch(FetcherOrderedGrouped.java:696)
at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.FetcherOrderedGrouped.fetchNext(FetcherOrderedGrouped.java:175)
at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.FetcherOrderedGrouped.callInternal(FetcherOrderedGrouped.java:191)
at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.FetcherOrderedGrouped.callInternal(FetcherOrderedGrouped.java:54)
... 5 more
]], Vertex did not succeed due to OWN_TASK_FAILURE, failedTasks:1 killedTasks:51, Vertex vertex_1499292782644_0001_1_03 [scope-623] killed/failed due to:OWN_TASK_FAILURE]
Vertex killed, vertexName=scope-624, vertexId=vertex_1499292782644_0001_1_04, diagnostics=[Vertex received Kill while in RUNNING state., Vertex did not succeed due to OTHER_VERTEX_FAILURE, failedTasks:0 killedTasks:40, Vertex vertex_1499292782644_0001_1_04 [scope-624] killed/failed due to:OTHER_VERTEX_FAILURE]
Vertex killed, vertexName=scope-634, vertexId=vertex_1499292782644_0001_1_05, diagnostics=[Vertex received Kill while in RUNNING state., Vertex did not succeed due to OTHER_VERTEX_FAILURE, failedTasks:0 killedTasks:33, Vertex vertex_1499292782644_0001_1_05 [scope-634] killed/failed due to:OTHER_VERTEX_FAILURE]
Vertex killed, vertexName=scope-622, vertexId=vertex_1499292782644_0001_1_02, diagnostics=[Vertex received Kill while in RUNNING state., Vertex did not succeed due to OTHER_VERTEX_FAILURE, failedTasks:0 killedTasks:1, Vertex vertex_1499292782644_0001_1_02 [scope-622] killed/failed due to:OTHER_VERTEX_FAILURE]
Vertex killed, vertexName=scope-615, vertexId=vertex_1499292782644_0001_1_00, diagnostics=[Vertex received Kill while in RUNNING state., Vertex did not succeed due to OTHER_VERTEX_FAILURE, failedTasks:0 killedTasks:9, Vertex vertex_1499292782644_0001_1_00 [scope-615] killed/failed due to:OTHER_VERTEX_FAILURE]
DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:4, counters=Counters: 60
org.apache.tez.common.counters.DAGCounter
NUM_FAILED_TASKS=73
NUM_KILLED_TASKS=213
NUM_SUCCEEDED_TASKS=117
TOTAL_LAUNCHED_TASKS=315
RACK_LOCAL_TASKS=42
AM_CPU_MILLISECONDS=2655350
AM_GC_TIME_MILLIS=14774
File System Counters
FILE_BYTES_READ=2494679851
FILE_BYTES_WRITTEN=4847503086
FILE_READ_OPS=0
FILE_LARGE_READ_OPS=0
FILE_WRITE_OPS=0
S3_BYTES_READ=11014106890
S3_BYTES_WRITTEN=0
S3_READ_OPS=0
S3_LARGE_READ_OPS=0
S3_WRITE_OPS=0
org.apache.tez.common.counters.TaskCounter
NUM_SPECULATIONS=71
REDUCE_INPUT_GROUPS=320893
REDUCE_INPUT_RECORDS=320958
COMBINE_INPUT_RECORDS=0
SPILLED_RECORDS=721418120
NUM_SHUFFLED_INPUTS=494
NUM_SKIPPED_INPUTS=0
NUM_FAILED_SHUFFLE_INPUTS=0
MERGED_MAP_OUTPUTS=494
GC_TIME_MILLIS=113975
CPU_MILLISECONDS=4706890
PHYSICAL_MEMORY_BYTES=41962962944
VIRTUAL_MEMORY_BYTES=212529692672
COMMITTED_HEAP_BYTES=41962962944
INPUT_RECORDS_PROCESSED=364674343
INPUT_SPLIT_LENGTH_BYTES=11013650258
OUTPUT_RECORDS=512016052
OUTPUT_BYTES=10892611492
OUTPUT_BYTES_WITH_OVERHEAD=5467887018
OUTPUT_BYTES_PHYSICAL=2515253426
ADDITIONAL_SPILLS_BYTES_WRITTEN=899759152
ADDITIONAL_SPILLS_BYTES_READ=2332686174
ADDITIONAL_SPILL_COUNT=56
SHUFFLE_CHUNK_COUNT=78
SHUFFLE_BYTES=4850739
SHUFFLE_BYTES_DECOMPRESSED=7035446
SHUFFLE_BYTES_TO_MEM=4223057
SHUFFLE_BYTES_TO_DISK=0
SHUFFLE_BYTES_DISK_DIRECT=627682
NUM_MEM_TO_DISK_MERGES=0
NUM_DISK_TO_DISK_MERGES=0
SHUFFLE_PHASE_TIME=161720
MERGE_PHASE_TIME=167514
FIRST_EVENT_RECEIVED=3448
LAST_EVENT_RECEIVED=154880
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
org.apache.hadoop.mapreduce.TaskCounter
COMBINE_INPUT_RECORDS=191969
COMBINE_OUTPUT_RECORDS=147020816
2017-07-06 11:26:11,379 INFO org.apache.hadoop.conf.Configuration.deprecation (PigTezLauncher-0): fs.default.name is deprecated. Instead, use fs.defaultFS
2017-07-06 11:26:11,430 INFO org.apache.pig.tools.pigstats.JobStats (PigTezLauncher-0): using output size reader: org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.FileBasedOutputSizeReader
2017-07-06 11:26:11,553 WARN org.apache.pig.tools.pigstats.JobStats (PigTezLauncher-0): unable to find the output file
java.io.FileNotFoundException: File does not exist.
at com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem.listStatus(S3NativeFileSystem.java:972)
at com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem.listStatus(S3NativeFileSystem.java:914)
at com.amazon.ws.emr.hadoop.fs.EmrFileSystem.listStatus(EmrFileSystem.java:337)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.FileBasedOutputSizeReader.getOutputSize(FileBasedOutputSizeReader.java:81)
at org.apache.pig.tools.pigstats.JobStats.getOutputSize(JobStats.java:351)
at org.apache.pig.tools.pigstats.tez.TezVertexStats.addOutputStatistics(TezVertexStats.java:324)
at org.apache.pig.tools.pigstats.tez.TezVertexStats.accumulateStats(TezVertexStats.java:207)
at org.apache.pig.tools.pigstats.tez.TezDAGStats.accumulateStats(TezDAGStats.java:238)
at org.apache.pig.tools.pigstats.tez.TezPigScriptStats.accumulateStats(TezPigScriptStats.java:187)
at org.apache.pig.backend.hadoop.executionengine.tez.TezJob.run(TezJob.java:243)
at org.apache.pig.backend.hadoop.executionengine.tez.TezLauncher$1.run(TezLauncher.java:210)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)
2017-07-06 11:26:12,493 WARN org.apache.pig.backend.hadoop.executionengine.tez.TezLauncher (main): Ooops! Some job has failed! Specify -stop_on_failure if you want Pig to stop immediately on failure.
2017-07-06 11:26:12,505 INFO org.apache.pig.tools.pigstats.tez.TezPigScriptStats (main): Script Statistics:
HadoopVersion: 2.7.3-amzn-2
PigVersion: 0.16.0-amzn-0
TezVersion: 0.8.4
UserId: hadoop
FileName: <path>/list_item_correlations.pig
StartedAt: 2017-07-06 02:04:37
FinishedAt: 2017-07-06 11:26:12
Features: HASH_JOIN,GROUP_BY,FILTER,UNION
Failed!
DAG 0:
Name: PigLatin:correlations-0_scope-0
ApplicationId: job_1499292782644_0001
TotalLaunchedTasks: 315
FileBytesRead: 2494679851
FileBytesWritten: 4847503086
HdfsBytesRead: 0
HdfsBytesWritten: 0
SpillableMemoryManager spill count: 0
Bags proactively spilled: 0
Records proactively spilled: 0
DAG Plan:
Tez vertex scope-615 -> Tez vertex scope-619,Tez vertex scope-623,
Tez vertex scope-619 -> Tez vertex scope-634,
Tez vertex scope-622 -> Tez vertex scope-623,
Tez vertex scope-623 -> Tez vertex scope-624,
Tez vertex scope-624 -> Tez vertex scope-634,
Tez vertex scope-634
Vertex Stats:
VertexId Parallelism TotalTasks InputRecords ReduceInputRecords OutputRecords FileBytesRead FileBytesWritten HdfsBytesRead HdfsBytesWritten Alias Feature Outputs
scope-619 19 19 0 320958 320893 7701880 9427573 0 0 jnd,vote_counts GROUP_BY
Failed vertices:
VertexId State Parallelism TotalTasks InputRecords ReduceInputRecords OutputRecords FileBytesRead FileBytesWritten HdfsBytesRead HdfsBytesWritten Alias Feature Outputs
scope-615 KILLED 26 26 147020816 0 294041632 1003592335 1936995311 0 0 grpd,pairs,vote_counts,vote_flags,votes1 MULTI_QUERY
scope-622 KILLED 26 26 217653527 0 217653527 1483385636 2901080202 0 0 pairs,votes2
scope-623 FAILED 52 52 0 0 0 0 0 0 0 counted,flags,grouped_flags,pairs,required_pairs HASH_JOIN
scope-624 KILLED -1 40 0 0 0 0 0 0 0 corr1,corr2,counted,fltrd,jnd GROUP_BY,MULTI_QUERY
scope-634 KILLED -1 33 0 0 0 0 0 0 0